Using a static naming approach to implement remote scope promotion

Creative Commons License

Yılmazer Metin A.

Turkish Journal of Electrical Engineering and Computer Sciences, vol.30, no.5, pp.1758-1772, 2022 (SCI-Expanded) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 30 Issue: 5
  • Publication Date: 2022
  • Doi Number: 10.55730/1300-0632.3903
  • Journal Name: Turkish Journal of Electrical Engineering and Computer Sciences
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, TR DİZİN (ULAKBİM)
  • Page Numbers: pp.1758-1772
  • Keywords: Asymmetric synchronization, GPUs, remote scope promotion, work-stealing
  • Istanbul Technical University Affiliated: Yes


© TÜBİTAK.GPUs employ simple coherence mechanisms and require explicit use of costly synchronization operations for data integrity. Local-scoped synchronization can be utilized to lower the performance penalty of synchronization when sharing is within a subgroup of threads. Unfortunately, in asymmetric sharing (which is an important dynamic sharing pattern), it is necessary to use global-scoped synchronization due to possible accesses by remote sharers. Remote Scope Promotion (RSP) was introduced to take advantage of local-scoped synchronization at regular accesses while using scope promotion at occasional remote accesses. First implementation of RSP makes use of a simple approach that performs costly cache operations on all L1 data caches when implementing scope promotion, and therefore, it performs poorly on large scale GPU systems. We present nRSP which utilizes a static naming mechanism to identify regularly accessing agent in asymmetric sharing and avoids applying costly coherence actions on every L1 data cache when implementing scope promotion. We evaluate nRSP using timing detailed Gem5-APU simulator modeling a GPU system with 128 Compute Units and show that nRSP lowers remote synchronization overhead greatly and improves performance considerably. On average, nRSP provides around 28% speedup on a 128 Compute Unit GPU device.