DocumentCode :
1799909
Title :
Compiler Support for Optimizing Memory Bank-Level Parallelism
Author :
Wei Ding ; Guttman, Diana ; Kandemir, Mahmut
Author_Institution :
Pennsylvania State Univ., University Park, PA, USA
fYear :
2014
fDate :
13-17 Dec. 2014
Firstpage :
571
Lastpage :
582
Abstract :
Many prior compiler-based optimization schemes focused exclusively on cache data locality. However, cache locality is only one part of the overall performance of applications running on emerging multicores or many cores. For example, memory stalls could constitute a very large fraction of execution time even in cache-optimized codes, and one of the main reasons for this is lack of memory-level parallelism. Motivated by this, we propose a compiler-based Bank-Level Parallelism (BLP) optimization scheme that uses loop tile scheduling. More specifically, we first use Cache Miss Equations to predict where the last-level cache miss will happen in each tile, and then identify the set of memory banks that will be accessed in each tile. Using this information, two tile scheduling algorithms are proposed to maximize BLP, each targeting a different scenario. We further discuss how our compiler-based scheme can be enhanced to consider memory controller-level parallelism and row-buffer locality. Our experimental evaluation using 11 multithreaded applications shows that the proposed BLP optimization can improve average BLP by 17.1% on average, resulting in a 9.2% reduction in average memory access latency. Furthermore, considering memory controller-level parallelism and row-buffer locality (in addition to BLP) takes our average improvement in memory access latency to 22.2%.
Keywords :
cache storage; multi-threading; multiprocessing systems; optimising compilers; program control structures; scheduling; BLP optimization scheme; cache data locality; cache miss equations; compiler support; compiler-based optimization schemes; last-level cache; loop tile scheduling; memory access latency; memory bank-level parallelism optimization; memory controller-level parallelism; multicores; multithreaded applications; row-buffer locality; Arrays; Optimization; Parallel processing; Random access memory; Schedules; Scheduling; Vectors; bank-level parallelism; compiler; memory controller-level parallelism; row-buffer locality;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on
Conference_Location :
Cambridge
ISSN :
1072-4451
Type :
conf
DOI :
10.1109/MICRO.2014.34
Filename :
7011418
Link To Document :
بازگشت