DocumentCode
154103
Title
A Case for Resource Efficient Prefetching in Multicores
Author
Khan, Mahrukh ; Sandberg, Anna ; Hagersten, Erik
Author_Institution
Dept. of Inf. Technol., Uppsala Univ., Uppsala, Sweden
fYear
2014
fDate
9-12 Sept. 2014
Firstpage
101
Lastpage
110
Abstract
Modern processors typically employ sophisticated prefetching techniques for hiding memory latency. Hardware prefetching has proven very effective and can speed up some SPEC CPU 2006 benchmarks by more than 40% when running in isolation. However, this speedup often comes at the cost of prefetching a significant volume of useless data (sometimes more than twice the data required) which wastes shared last level cache space and off-chip bandwidth. This paper explores how an accurate resource-efficient prefetching scheme can benefit performance by conserving shared resources in multicores. We present a framework that uses low-overhead runtime sampling and fast cache modeling to accurately identify memory instructions that frequently miss in the cache. We then use this information to automatically insert software prefetches in the application. Our prefetching scheme has good accuracy and employs cache bypassing whenever possible. These properties help reduce off-chip bandwidth consumption and last-level cache pollution. While single-thread performance remains comparable to hardware prefetching, the full advantage of the scheme is realized when several cores are used and demand for shared resources grows. We evaluate our method on two modern commodity multicores. Across 180 mixed workloads that fully utilize a multicore, the proposed software prefetching mechanism achieves up to 24% better throughput than hardware prefetching, and performs 10% better on average.
Keywords
cache storage; sampling methods; shared memory systems; storage management; SPEC CPU 2006 benchmarks; cache modeling; cache space; commodity multicore; hardware prefetching; last-level cache pollution; low-overhead runtime sampling; memory latency; multicores; off-chip bandwidth consumption; prefetching techniques; resource efficient prefetching; resource-efficient prefetching scheme; shared resources; single-thread performance; Analytical models; Benchmark testing; Hardware; Load modeling; Multicore processing; Prefetching;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing (ICPP), 2014 43rd International Conference on
Conference_Location
Minneapolis MN
ISSN
0190-3918
Type
conf
DOI
10.1109/ICPP.2014.19
Filename
6957219
Link To Document