DocumentCode :
128987
Title :
Introducing Thread Criticality awareness in Prefetcher Aggressiveness Control
Author :
Panda, Biplab ; Balachandran, Shankar
Author_Institution :
Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Madras, Chennai, India
fYear :
2014
fDate :
24-28 March 2014
Firstpage :
1
Lastpage :
6
Abstract :
A single parallel application running on a multi-core system shows sub-linear speedup because of slow progress of one or more threads known as critical threads. Some of the reasons for the slow progress of threads are (1) load imbalance, (2) frequent cache misses and (3) effect of synchronization primitives. Identifying critical threads and minimizing their cache miss latencies can improve the overall execution time of a program. One way to hide and tolerate the cache misses is through hardware prefetching. Hardware prefetching is one of the most commonly used memory latency hiding techniques. Previous studies have shown the effectiveness of hardware prefetchers for multiprogrammed workloads (multiple sequential applications running independently on different cores). In contrast to multiprogrammed workloads, the performance of a single parallel application depends on the progress of slow progress(critical) threads. This paper introduces a prefetcher aggressiveness control mechanism called Thread Criticality-aware Prefetcher Aggressiveness Control (TCPAC). TCPAC controls the aggressiveness of the prefetchers at the L2 prefetching controllers (known as TCPAC-P), DRAM controller (known as TCPAC-D) and at the Last Level Cache (LLC) controller (known as TCPAC-C) using prefetch accuracy and thread progress. Each TCPAC sub-technique outperforms the respective state-of-the-art techniques such as HPAC [2], PADC [4], and PACMan [3] and the combination of all the TCPAC sub-techniques named as TCPAC-PDC outperforms the combination of HPAC, PADC, and PACMan. On an average, on a 8 core system, in terms of improvement in execution time, TCPAC-PDC outperforms the combination of HPAC, PADC, and PACMan by 7.61%. For 12 and 16 cores, TCPAC-PDC beats the state-of-the-art combinations by 7.21% and 8.32% respectively.
Keywords :
multi-threading; multiprocessing systems; storage management; DRAM controller; L2 prefetching controllers; LLC controller; TCPAC mechanism; TCPAC-C; TCPAC-D; TCPAC-P; cache miss; cache miss latencies minimization; critical threads identification; hardware prefetching; last level cache controller; load imbalance; memory latency hiding techniques; multicore system; multiprogrammed workloads; prefetch accuracy; prefetcher aggressiveness control; program execution time; single parallel application; synchronization primitives; thread criticality awareness; thread progress; Accuracy; Hardware; Measurement; Message systems; Prefetching; Random access memory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014
Conference_Location :
Dresden
Type :
conf
DOI :
10.7873/DATE.2014.092
Filename :
6800293
Link To Document :
بازگشت