Title :
Overcoming limitations of prefetching in multiprocessors by compiler-initiated coherence actions
Author :
Skeppstedt, Jonas
Author_Institution :
Dept. of Comput. Eng., Chalmers Univ. of Technol., Goteborg, Sweden
Abstract :
In this paper we first identify limitations of compiler-controlled prefetching in a CC-NUMA multiprocessor with a write-invalidate cache coherence protocol. Compiler-controlled prefetch techniques for CC-NUMAs often are focused only, on stride-accesses, and this introduces a major limitation. We consider combining prefetch with two other compiler-controlled techniques to partly remedy the situation: (1) load-exclusive to reduce write-latency and (2) store-update to reduce read-latency. The purpose of each of these techniques in a machine with prefetch is to let them reduce latency for accesses which the prefetch technique could not handle. We evaluate two different scenarios, firstly with a hybrid compiler/hardware prefetch technique and secondly with an optimal stride-prefetcher. We find that the combined gains under the hybrid prefetch technique are significant for six applications we have studied: in average, 71% of the original write-stall time remains after using the hybrid prefetcher, and of these ownership-requests, 60% would be eliminated using load-exclusive; in average, 68% of the read-stall time remains after using the hybrid prefetcher and of these read-misses, 34% were serviced by remote caches and would be converted by store-update into misses serviced by a clean copy in memory which reduces the read-latency. With an optimal stride-prefetcher our results show that it beneficient to complement prefetch, with the two techniques here as well
Keywords :
memory architecture; parallel architectures; program compilers; storage management; CC-NUMA multiprocessor; compiler-analysis; compiler-controlled prefetching; compiler-initiated coherence; memory access latency reduction; migratory sharing; multiprocessors; prefetch; prefetching; read-latency; read-stall time; write-latency; Delay; Prefetching; Protocols; Read-write memory; Scalability;
Conference_Titel :
Parallel Architectures and Compilation Techniques., 1997. Proceedings., 1997 International Conference on
Conference_Location :
San Francisco, CA
Print_ISBN :
0-8186-8090-3
DOI :
10.1109/PACT.1997.644023