Title :
Multi-chain prefetching: effective exploitation of inter-chain memory parallelism for pointer-chasing codes
Author :
Kohout, Nicholas ; Choi, Seungryul ; Kim, Dongkeun ; Yeung, Donald
Author_Institution :
Intel Corp., USA
Abstract :
Presents multi-chain prefetching, a technique that utilizes offline analysis and a hardware prefetch engine to prefetch multiple independent pointer chains simultaneously, thus exploiting inter-chain memory parallelism for the purpose of memory latency tolerance. This paper makes three contributions. First, we introduce a scheduling algorithm that identifies independent pointer chains in pointer-chasing codes and computes a prefetch schedule that overlaps serialized cache misses across separate chains. Our analysis focuses an static traversals. We also propose using speculation to identify independent pointer chains in dynamic traversals. Second, we present the design of a prefetch engine that traverses pointer-based data structures and overlaps multiple pointer chains according to our scheduling algorithm. Finally, we conduct an experimental evaluation of multi-chain prefetching and compare its performance against two existing techniques: jump pointer prefetching and prefetch arrays. Our results show that multi-chain prefetching improves the execution time by 40% across six pointer-chasing kernels from the Olden benchmark suite and by 8% across four SPECInt CPU2000 benchmarks. Multi-chain prefetching also outperforms jump pointer prefetching and prefetch arrays by 28% on Olden, and by 12% on SPECInt. Furthermore, speculation can enable multi-chain prefetching for some dynamic traversal codes, but our technique loses its effectiveness when the pointer-chain traversal order is unpredictable. Finally, we also show that combining multi-chain prefetching with prefetch arrays can potentially provide higher performance than either technique alone
Keywords :
data structures; parallel memories; performance evaluation; processor scheduling; storage management; Olden benchmark suite; SPECInt CPU2000 benchmarks; composed data structure traversal; dynamic traversals; execution time; hardware prefetch engine; inter-chain memory parallelism; jump pointer prefetching; memory latency tolerance; multi-chain prefetching; multiple independent painter chains; offline analysis; performance; pointer-based data structures; pointer-chain traversal order; pointer-chasing codes; pointer-chasing kernels; prefetch arrays; scheduling algorithm; serialized cache misses; speculation; static traversals; Algorithm design and analysis; Data structures; Delay; Engines; Hardware; Kernel; Parallel processing; Prefetching; Processor scheduling; Scheduling algorithm;
Conference_Titel :
Parallel Architectures and Compilation Techniques, 2001. Proceedings. 2001 International Conference on
Conference_Location :
Barcelona
Print_ISBN :
0-7695-1363-8
DOI :
10.1109/PACT.2001.953307