DocumentCode
3370429
Title
Code transformations to improve memory parallelism
Author
Pai, Vijay S. ; Adve, Sarita
Author_Institution
Dept. of Electr. & Comput. Eng., Rice Univ., Houston, TX, USA
fYear
1999
fDate
1999
Firstpage
147
Lastpage
155
Abstract
Current microprocessors incorporate techniques to exploit instruction-level parallelism (ILP). However, previous work has shown that these ILP techniques are less effective in removing memory stall time than CPU time, making the memory system a greater bottleneck in ILP-based systems than previous-generation systems. These deficiencies arise largely because applications present limited opportunities for an out-of-order issue processor to overlap multiple read misses, the dominant source of memory stalls. This work proposes code transformations to increase parallelism in the memory system by overlapping multiple read misses within the same instruction window, while preserving cache locality. We present an analysis and transformation framework suitable for compiler implementation. Our simulation experiments show substantial increases in memory parallelism, leading to execution time reductions averaging 23% in a multiprocessor and 30% in a uniprocessor. We see similar benefits on a Convex Exemplar
Keywords
cache storage; memory architecture; parallel architectures; performance evaluation; Convex Exemplar; cache locality; code transformations; compiler implementation; instruction-level parallelism; memory parallelism; microprocessors; multiple read misses; multiprocessor; simulation experiments; Computer aided instruction; Computer science; Concurrent computing; Delay; Microprocessors; Multiprocessing systems; Out of order; Parallel processing; Pipeline processing; Prefetching;
fLanguage
English
Publisher
ieee
Conference_Titel
Microarchitecture, 1999. MICRO-32. Proceedings. 32nd Annual International Symposium on
Conference_Location
Haifa
ISSN
1072-4451
Print_ISBN
0-7695-0437-X
Type
conf
DOI
10.1109/MICRO.1999.809452
Filename
809452
Link To Document