Using a static naming approach to implement remote scope promotion

Yılmazer Metin, Ayşe

doi:10.55730/1300-0632.3903

Using a static naming approach to implement remote scope promotion

Atıf İçin Kopyala

Yılmazer Metin A.

Turkish Journal of Electrical Engineering and Computer Sciences, cilt.30, sa.5, ss.1758-1772, 2022 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 30 Sayı: 5
Basım Tarihi: 2022
Doi Numarası: 10.55730/1300-0632.3903
Dergi Adı: Turkish Journal of Electrical Engineering and Computer Sciences
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, TR DİZİN (ULAKBİM)
Sayfa Sayıları: ss.1758-1772
Anahtar Kelimeler: Asymmetric synchronization, GPUs, remote scope promotion, work-stealing
İstanbul Teknik Üniversitesi Adresli: Evet

Özet

© TÜBİTAK.GPUs employ simple coherence mechanisms and require explicit use of costly synchronization operations for data integrity. Local-scoped synchronization can be utilized to lower the performance penalty of synchronization when sharing is within a subgroup of threads. Unfortunately, in asymmetric sharing (which is an important dynamic sharing pattern), it is necessary to use global-scoped synchronization due to possible accesses by remote sharers. Remote Scope Promotion (RSP) was introduced to take advantage of local-scoped synchronization at regular accesses while using scope promotion at occasional remote accesses. First implementation of RSP makes use of a simple approach that performs costly cache operations on all L1 data caches when implementing scope promotion, and therefore, it performs poorly on large scale GPU systems. We present nRSP which utilizes a static naming mechanism to identify regularly accessing agent in asymmetric sharing and avoids applying costly coherence actions on every L1 data cache when implementing scope promotion. We evaluate nRSP using timing detailed Gem5-APU simulator modeling a GPU system with 128 Compute Units and show that nRSP lowers remote synchronization overhead greatly and improves performance considerably. On average, nRSP provides around 28% speedup on a 128 Compute Unit GPU device.