Title :
Combining loop fusion with prefetching on shared-memory multiprocessors
Author :
Manjikian, Naraig
Author_Institution :
Dept. of Electr. & Comput. Eng., Toronto Univ., Ont., Canada
Abstract :
The performance of programs consisting of parallel loops on shared-memory multiprocessors is limited by long memory latencies as processor speeds increase more rapidly than memory speeds. Two complementary techniques for addressing memory latency and improving performance are: (a) cache locality enhancement for latency reduction and (b) data prefetching for latency tolerance. This paper studies the benefit of combining loop fusion for locality enhancement with prefetching. Experimental results are reported for multiprocessors with support for prefetching. For a complete application on an SGI Power Challenge R10000, combining loop fusion with prefetching improves parallel speedup by 46%
Keywords :
cache storage; shared memory systems; software performance evaluation; SGI Power Challenge R10000; cache locality enhancement; data prefetching; latency reduction; long memory latencies; loop fusion; memory latency; parallel loops; prefetching; shared-memory multiprocessors; Concurrent computing; Delay; Filters; Fuses; Hardware; Jacobian matrices; Lapping; Microprocessors; Parallel processing; Prefetching;
Conference_Titel :
Parallel Processing, 1997., Proceedings of the 1997 International Conference on
Conference_Location :
Bloomington, IL
Print_ISBN :
0-8186-8108-X
DOI :
10.1109/ICPP.1997.622560