Title :
Distance-aware round-robin mapping for large NUCA caches
Author :
Ros, Alberto ; Cintra, Marcelo ; Acacio, Manuel E. ; García, José M.
Author_Institution :
Dept. de Ing. y Tecnol. de Comput., Univ. de Murcia, Murcia, Spain
Abstract :
In many-core architectures, memory blocks are commonly assigned to the banks of a NUCA cache by following a physical mapping. This mapping assigns blocks to cache banks in a round-robin fashion, thus neglecting the distance between the cores that most frequently access every block and the corresponding NUCA bank for the block. This issue impacts both cache access latency and the amount of on-chip network traffic generated. On the other hand, first-touch mapping policies, which take into account distance, can lead to an unbalanced utilization of cache banks, and consequently, to an increased number of expensive off-chip accesses. In this work, we propose the distance-aware round-robin mapping policy, an OS-managed policy which addresses the trade-off between cache access latency and number of off-chip accesses. Our policy tries to map the pages accessed by a core to its closest (local) bank, like in a first-touch policy. However, our policy also introduces an upper bound on the deviation of the distribution of memory pages among cache banks, which lessens the number of off-chip accesses. This tradeoff is addressed without requiring any extra hardware structure. We also show that the private cache indexing commonly used in many-core architectures is not the most appropriate for OS-managed distance-aware mapping policies, and propose to employ different bits for such indexing. Using GEMS simulator we show that our proposal obtains average improvements of 11% for parallel applications and 14% for multi-programmed workloads in terms of execution time, and significant reductions in network traffic, over a traditional physical mapping. Moreover, when compared to a first-touch mapping policy, our proposal improves average execution time by 5% for parallel applications and 6% for multi-programmed workloads, slightly increasing on-chip network traffic.
Keywords :
cache storage; multiprogramming; operating systems (computers); parallel processing; GEMS simulator; OS-managed policy; cache access latency; cache banks; distance-aware round-robin mapping; first-touch mapping policy; many-core architectures; memory blocks; multiprogrammed workloads; nonuniform cache architecture; on-chip network traffic; parallel application; physical mapping; Computer architecture; Delay; Indexing; Informatics; Memory architecture; Organizing; Physics computing; Proposals; Telecommunication traffic; Tiles;
Conference_Titel :
High Performance Computing (HiPC), 2009 International Conference on
Conference_Location :
Kochi
Print_ISBN :
978-1-4244-4922-4
Electronic_ISBN :
978-1-4244-4921-7
DOI :
10.1109/HIPC.2009.5433220