DocumentCode :
3457103
Title :
Using Aggressor Thread Information to Improve Shared Cache Management for CMPs
Author :
Liu, Wanli ; Yeung, Donald
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Maryland at Coll. Park, College Park, MD, USA
fYear :
2009
fDate :
12-16 Sept. 2009
Firstpage :
372
Lastpage :
383
Abstract :
Shared cache allocation policies play an important role in determining CMP performance. The simplest policy, LRU, allocates cache implicitly as a consequence of its replacement decisions. But under high cache interference, LRU performs poorly because some memory-intensive threads, or aggressor threads, allocate cache that could be more gainfully used by other (less memory-intensive) threads. Techniques like cache partitioning can address this problem by performing explicit allocation to prevent aggressor threads from taking over the cache. Whether implicit or explicit, the key factor controlling cache allocation is victim thread selection. The choice of victim thread relative to the cache-missing thread determines each cache misspsilas impact on cache allocation: if the two are the same, allocation doesn´t change, but if the two are different, then one cache block shifts from the victim thread to the cache-missing thread. In this paper, we study an omniscient policy, called ORACLE-VT, that uses off-line information to always select the best victim thread, and hence, maintain the best per-thread cache allocation at all times. We analyze ORACLE-VT, and find it victimizes aggressor threads about 80% of the time. To see if we can approximate ORACLE-VT, we develop AGGRESSOR-VT, a policy that probabilistically victimizes aggressor threads with strong bias. Our results show AGGRESSOR-VT comes close to ORACLE-VTpsilas miss rate, achieving three-quarters of its gain over LRU and roughly half of its gain over an ideal cache partitioning technique. To make AGGRESSOR-VT feasible for real systems, we develop a sampling algorithm that ldquolearnsrdquo the identity of aggressor threads via runtime performance feedback. We also modify AGGRESSOR-VT to permit adjusting the probability for victimizing aggressor threads, and use our sampling algorithm to learn the per-thread victimization probabilities that optimize system performance (e.g., weighted IPC). We call this policy AGGRESSO- - Rpr-VT. Our results show AGGRESSORpr-VT outperforms LRU, UCP, and an ideal cache way partitioning technique by 4.86%, 3.15%, and 1.09%, respectively.
Keywords :
cache storage; microprocessor chips; multi-threading; probability; sampling methods; shared memory systems; AGGRESSOR-VT; CMP performance; ORACLE-VT; aggressor thread; cache partitioning; per-thread victimization probability; runtime performance feedback; sampling algorithm; shared cache allocation; shared cache management; Concurrent computing; Conference management; Educational institutions; Engineering management; Interference; Parallel architectures; Partitioning algorithms; Resource management; Sampling methods; Yarn; aggressor thread; cache partitioning; memory interleaving; shared cache management;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Architectures and Compilation Techniques, 2009. PACT '09. 18th International Conference on
Conference_Location :
Raleigh, NC
ISSN :
1089-795X
Print_ISBN :
978-0-7695-3771-9
Type :
conf
DOI :
10.1109/PACT.2009.13
Filename :
5260530
Link To Document :
بازگشت