DocumentCode :
3239313
Title :
Improving the effectiveness of software prefetching with adaptive executions
Author :
Saavedra, Rafael H. ; Park, Daeyeon
Author_Institution :
Dept. of Comput. Sci., Univ. of Southern California, Los Angeles, CA, USA
fYear :
1996
fDate :
35339
Firstpage :
68
Lastpage :
78
Abstract :
The effectiveness of software prefetching for tolerating latency depends mainly on the ability of programmers and/or compilers to: 1) predict in advance the magnitude of the run-time remote memory latency, and 2) insert prefetches at a distance that minimizes stall time without causing cache pollution. Scalable heterogeneous multiprocessors, such as network of computers (NOWs), present special challenges to static software prefetching because on these systems the network topology and node configuration are not completely determined at compile time. Furthermore, dynamic software prefetching cannot do much better because individual nodes on heterogeneous large NOWs would tend to experience different remote memory delays over time. A fixed prefetch distance, even when computed at run-time, cannot perform well for the whole duration of a software pipeline. Here we present an adaptive scheme for software prefetching that makes it possible for nodes to dynamically change, not only the amount of prefetching, but the prefetch distance as well. Doing this makes it possible to tailor the execution of software pipeline to the prevailing conditions affecting each node. We show how simple performance data collected by hardware monitors can allow programs to observe, evaluate and change their prefetching policies. Our results show that on the benchmarks we simulated adaptive prefetching was capable of improving performance over static and dynamic prefetching by 10% to 60%. More important, future increases in the heterogeneity and size of NOWs will increase the advantages of adaptive prefetching over static and dynamic schemes
Keywords :
computer architecture; fault tolerant computing; multiprocessing systems; performance evaluation; adaptive executions; cache pollution; fixed prefetch distance; latency tolerance; network of computers; performance data; run-time remote memory latency; scalable heterogeneous multiprocessors; software prefetching; Computer networks; Delay effects; Network topology; Pipelines; Pollution; Prefetching; Program processors; Programming profession; Runtime; Software performance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Architectures and Compilation Techniques, 1996., Proceedings of the 1996 Conference on
Conference_Location :
Boston, MA
ISSN :
1089-795X
Print_ISBN :
0-8186-7633-7
Type :
conf
DOI :
10.1109/PACT.1996.552556
Filename :
552556
Link To Document :
بازگشت