DocumentCode :
1799913
Title :
B-Fetch: Branch Prediction Directed Prefetching for Chip-Multiprocessors
Author :
Kadjo, David ; Kim, Jung-Ho ; Sharma, Parmanand ; Panda, Reena ; Gratz, Paul ; Jimenez, Daniel
fYear :
2014
fDate :
13-17 Dec. 2014
Firstpage :
623
Lastpage :
634
Abstract :
For decades, the primary tools in alleviating the "Memory Wall" have been large cache hierarchies and dataprefetchers. Both approaches, become more challenging in modern, Chip-multiprocessor (CMP) design. Increasing the last-level cache (LLC) size yields diminishing returns in terms of performance per Watt, given VLSI power scaling trends, this approach becomes hard to justify. These trends also impact hardware budgets for prefetchers. Moreover, in the context of CMPs running multiple concurrent processes, prefetching accuracy is critical to prevent cache pollution effects. These concerns point to the need for a light-weight prefetcher with high accuracy. Existing data prefetchers may generally be classified as low-overhead and low accuracy (Next-n, Stride, etc.) or high-overhead and high accuracy (STeMS, ISB). Wepropose B-Fetch: a data prefetcher driven by branch prediction and effective address value speculation. B-Fetch leverages control flow prediction to generate an expected future path of the executing application. It then speculatively computes the effective address of the load instructions along that path based upon a history of past register transformations. Detailed simulation using a cycle accurate simulator shows a geometric mean speedup of 23.4% for single-threaded workloads, improving to 28.6% for multi-application workloads over a baseline system without prefetching. We find that B-Fetch outperforms an existing "best-of-class" light-weight prefetcher under single-threaded and multi programmed workloads by 9% on average, with 65% less storage overhead.
Keywords :
VLSI; cache storage; microprocessor chips; multiprocessing systems; multiprogramming; power aware computing; B-Fetch; CMP design; LLC; VLSI power scaling; baseline system; branch prediction directed prefetching; cache hierarchies; cache pollution effects; chip-multiprocessor design; concurrent processes; control flow prediction; high-overhead and high accuracy data prefetchers; last-level cache; light-weight prefetcher; low-overhead and low accuracy data prefetchers; memory wall; multiapplication workloads; multiprogrammed workloads; single-threaded workloads; Accuracy; Benchmark testing; Engines; Hardware; Pipelines; Prefetching; Registers; Bfetch; Branch Prediction; Chip-Multiprocessors; Data Cache; Prefetching;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on
Conference_Location :
Cambridge
ISSN :
1072-4451
Type :
conf
DOI :
10.1109/MICRO.2014.29
Filename :
7011422
Link To Document :
بازگشت