DocumentCode :
1565895
Title :
Guided region prefetching: a cooperative hardware/software approach
Author :
Wang, Zhenlin ; Burger, Doug ; McKinley, Kathryn S. ; Reinhardt, Steven K. ; Weems, Charles C.
Author_Institution :
Dept. of Comput. Sci., Massachusetts Univ., Amherst, MA, USA
fYear :
2003
Firstpage :
388
Lastpage :
398
Abstract :
Despite large caches, main-memory access latencies still cause significant performance losses in many applications. Numerous hardware and software prefetching schemes have been proposed to tolerate these latencies. Software prefetching typically provides better prefetch accuracy than hardware, but is limited by prefetch instruction overheads and the compiler´s limited ability to schedule prefetches sufficiently far in advance to cover level-two cache miss latencies. Hardware prefetching can be effective at hiding these large latencies, but generates many useless prefetches and consumes considerable memory bandwidth. We propose a cooperative hardware-software prefetching scheme called guided region prefetching (GRP), which uses compiler-generated hints encoded in load instructions to regulate an aggressive hardware prefetching engine. We compare GRP against a sophisticated pure hardware stride prefetcher and a scheduled region prefetching (SRP) engine. SRP and GRP show the best performance, with respective 22% and 21% gains over no prefetching, but SRP incurs 180% extra memory traffic-nearly tripling bandwidth requirements. GRP achieves performance close to SRP, but with a mere eighth of the extra prefetching traffic, a 23% increase over no prefetching. The GRP hardware-software collaboration thus combines the accuracy of compiler-based program analysis with the performance potential of aggressive hardware prefetching, bringing the performance gap versus a perfect L2 cache under 20%.
Keywords :
cache storage; instruction sets; performance evaluation; program compilers; storage management; GRP; SRP; cache memory; compiler-based program analysis; cooperative hardware-software approach; guided region prefetching; hardware prefetching; load instruction overhead; main-memory access latency; memory bandwidth; memory traffic; performance gain; prefetch scheduling; scheduled region prefetching; software prefetching; spatial data structure; Application software; Bandwidth; Collaboration; Delay; Engines; Hardware; Performance gain; Performance loss; Prefetching; Program processors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Architecture, 2003. Proceedings. 30th Annual International Symposium on
ISSN :
1063-6897
Print_ISBN :
0-7695-1945-8
Type :
conf
DOI :
10.1109/ISCA.2003.1207016
Filename :
1207016
Link To Document :
بازگشت