مرکز منطقه ای اطلاع رساني علوم و فناوري - Predicting inter-thread cache contention on a chip multi-processor architecture

DocumentCode :

2424898

Title :

Predicting inter-thread cache contention on a chip multi-processor architecture

Author :

Chandra, Dhruba ; Guo, Fei ; Kim, Seongbeom ; Solihin, Yan

Author_Institution :

Dept. of Electr. & Comput. Eng., North Carolina State Univ., Raleigh, NC, USA

fYear :

2005

fDate :

12-16 Feb. 2005

Firstpage :

340

Lastpage :

351

Abstract :

This paper studies the impact of L2 cache sharing on threads that simultaneously share the cache, on a chip multi-processor (CMP) architecture. Cache sharing impacts threads nonuniformly, where some threads may be slowed down significantly, while others are not. This may cause severe performance problems such as sub-optimal throughput, cache thrashing, and thread starvation for threads that fail to occupy sufficient cache space to make good progress. Unfortunately, there is no existing model that allows extensive investigation of the impact of cache sharing. To allow such a study, we propose three performance models that predict the impact of cache sharing on co-scheduled threads. The input to our models is the isolated L2 cache stack distance or circular sequence profile of each thread, which can be easily obtained on-line or off-line. The output of the models is the number of extra L2 cache misses for each thread due to cache sharing. The models differ by their complexity and prediction accuracy. We validate the models against a cycle-accurate simulation that implements a dual-core CMP architecture, on fourteen pairs of mostly SPEC benchmarks. The most accurate model, the inductive probability model, achieves an average error of only 3.9%. Finally, to demonstrate the usefulness and practicality of the model, a case study that details the relationship between an application´s temporal reuse behavior and its cache sharing impact is presented.

Keywords :

cache storage; computational complexity; computer architecture; microprocessor chips; multi-threading; multiprocessing systems; L2 cache misses; L2 cache sharing; chip multiprocessor architecture; circular sequence profile; circular sequence thread profile; computational complexity; coscheduled thread; dual-core CMP architecture; inductive probability model; interthread cache; isolated L2 cache stack distance; nonuniform threading; temporal reuse behavior; Accuracy; Analytical models; Art; Bars; Career development; Computer architecture; Hardware; Predictive models; Throughput; Yarn;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on

ISSN :

1530-0897

Print_ISBN :

0-7695-2275-0

Type :

conf

DOI :

10.1109/HPCA.2005.27

Filename :

1385956

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2424898