DocumentCode :
1564758
Title :
Fair cache sharing and partitioning in a chip multiprocessor architecture
Author :
Kim, Seongbeom ; Chandra, Dhruba ; Solihin, Yan
Author_Institution :
Dept. of Electr. & Comput. Eng., North Carolina State Univ., Raleigh, NC, USA
fYear :
2004
Firstpage :
111
Lastpage :
122
Abstract :
This paper presents a detailed study of fairness in cache sharing between threads in a chip multiprocessor (CMP) architecture. Prior work in CMP architectures has only studied throughput optimization techniques for a shared cache. The issue of fairness in cache sharing, and its relation to throughput, has not been studied. Fairness is a critical issue because the operating system (OS) thread scheduler´s effectiveness depends on the hardware to provide fair cache sharing to co-scheduled threads. Without such hardware, serious problems, such as thread starvation and priority inversion, can arise and render the OS scheduler ineffective. This paper makes several contributions. First, it proposes and evaluates five cache fairness metrics that measure the degree of fairness in cache sharing, and shows that two of them correlate very strongly with the execution-time fairness. Execution-time fairness is defined as how uniform the execution times of co-scheduled threads are changed, where each change is relative to the execution time of the same thread running alone. Secondly, using the metrics, the paper proposes static and dynamic L2 cache partitioning algorithms that optimize fairness. The dynamic partitioning algorithm is easy to implement, requires little or no profiling, has low overhead, and does not restrict the cache replacement algorithm to LRU. The static algorithm, although requiring the cache to maintain LRU stack information, can help the OS thread scheduler to avoid cache thrashing. Finally, this paper studies the relationship between fairness and throughput in detail. We found that optimizing fairness usually increases throughput, while maximizing throughput does not necessarily improve fairness. Using a set of co-scheduled pairs of benchmarks, on average our algorithms improve fairness by a factor of 4×, while increasing the throughput by 15%, compared to a nonpartitioned shared cache.
Keywords :
benchmark testing; cache storage; multi-threading; multiprocessing systems; operating systems (computers); parallel architectures; processor scheduling; cache partitioning algorithms; cache sharing; chip multiprocessor architecture; dynamic partitioning algorithm; execution-time fairness; operating system thread scheduler; optimization techniques; Computer architecture; Hardware; Heuristic algorithms; Operating systems; Partitioning algorithms; Resource management; Scheduling algorithm; Surface-mount technology; Throughput; Yarn;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Architecture and Compilation Techniques, 2004. PACT 2004. Proceedings. 13th International Conference on
ISSN :
1089-795X
Print_ISBN :
0-7695-2229-7
Type :
conf
DOI :
10.1109/PACT.2004.1342546
Filename :
1342546
Link To Document :
بازگشت