DocumentCode :
2518365
Title :
Cooperative prefetching: compiler and hardware support for effective instruction prefetching in modern processors
Author :
Luk, Chi-Keung ; Mowry, Todd C.
Author_Institution :
Dept. of Comput. Sci., Toronto Univ., Ont., Canada
fYear :
1998
fDate :
30 Nov-2 Dec 1998
Firstpage :
182
Lastpage :
193
Abstract :
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especially for commercial applications. Although instruction prefetching is an attractive technique for tolerating this latency, we find that existing prefetching schemes are insufficient for modern superscalar processors since they fail to issue prefetches early enough (particularly for non-sequential accesses). To overcome these limitations, we propose a new instruction prefetching technique whereby the hardware and software cooperate to hide the latency as follows. The hardware performs aggressive sequential prefetching combined with a novel prefetch filtering mechanism to allow it to get far ahead without polluting the cache. To hide the latency of non-sequential accesses, we propose and implement a novel compiler algorithm which automatically inserts instruction prefetch instructions into the executable to prefetch the targets of control transfers far enough in advance. Our experimental results demonstrate that this new approach results in speedups ranging from 9.4% to 18.5% (13.3% on average) over the original execution time on an out-of-order superscalar processor; which is more than double the average speedup of the best existing schemes (6.5%). This is accomplished by hiding an average of 71% of the original instruction stall time, compared with only 36% for the best existing schemes. We find that both the prefetch filtering and compiler-inserted prefetching components of our design are essential and complementary, that the compiler can limit the code expansion to less than 10% on average, and that our scheme is robust with respect to variations in miss latency and bandwidth
Keywords :
computer architecture; performance evaluation; program compilers; cache miss latency; compiler; compiler-inserted prefetching; instruction prefetching; performance bottleneck; prefetch filtering; Application software; Computer science; Delay; Ear; Electronic switching systems; Filtering; Hardware; National electric code; Prefetching; Read only memory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Microarchitecture, 1998. MICRO-31. Proceedings. 31st Annual ACM/IEEE International Symposium on
Conference_Location :
Dallas, TX
ISSN :
1072-4451
Print_ISBN :
0-8186-8609-X
Type :
conf
DOI :
10.1109/MICRO.1998.742780
Filename :
742780
Link To Document :
بازگشت