مرکز منطقه ای اطلاع رساني علوم و فناوري - B-Fetch: Branch Prediction Directed Prefetching for Chip-Multiprocessors

DocumentCode :

1799913

Title :

B-Fetch: Branch Prediction Directed Prefetching for Chip-Multiprocessors

Author :

Kadjo, David ; Kim, Jung-Ho ; Sharma, Parmanand ; Panda, Reena ; Gratz, Paul ; Jimenez, Daniel

fYear :

2014

fDate :

13-17 Dec. 2014

Firstpage :

623

Lastpage :

634

Abstract :

For decades, the primary tools in alleviating the "Memory Wall" have been large cache hierarchies and dataprefetchers. Both approaches, become more challenging in modern, Chip-multiprocessor (CMP) design. Increasing the last-level cache (LLC) size yields diminishing returns in terms of performance per Watt, given VLSI power scaling trends, this approach becomes hard to justify. These trends also impact hardware budgets for prefetchers. Moreover, in the context of CMPs running multiple concurrent processes, prefetching accuracy is critical to prevent cache pollution effects. These concerns point to the need for a light-weight prefetcher with high accuracy. Existing data prefetchers may generally be classified as low-overhead and low accuracy (Next-n, Stride, etc.) or high-overhead and high accuracy (STeMS, ISB). Wepropose B-Fetch: a data prefetcher driven by branch prediction and effective address value speculation. B-Fetch leverages control flow prediction to generate an expected future path of the executing application. It then speculatively computes the effective address of the load instructions along that path based upon a history of past register transformations. Detailed simulation using a cycle accurate simulator shows a geometric mean speedup of 23.4% for single-threaded workloads, improving to 28.6% for multi-application workloads over a baseline system without prefetching. We find that B-Fetch outperforms an existing "best-of-class" light-weight prefetcher under single-threaded and multi programmed workloads by 9% on average, with 65% less storage overhead.

Keywords :

VLSI; cache storage; microprocessor chips; multiprocessing systems; multiprogramming; power aware computing; B-Fetch; CMP design; LLC; VLSI power scaling; baseline system; branch prediction directed prefetching; cache hierarchies; cache pollution effects; chip-multiprocessor design; concurrent processes; control flow prediction; high-overhead and high accuracy data prefetchers; last-level cache; light-weight prefetcher; low-overhead and low accuracy data prefetchers; memory wall; multiapplication workloads; multiprogrammed workloads; single-threaded workloads; Accuracy; Benchmark testing; Engines; Hardware; Pipelines; Prefetching; Registers; Bfetch; Branch Prediction; Chip-Multiprocessors; Data Cache; Prefetching;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on

Conference_Location :

Cambridge

ISSN :

1072-4451

Type :

conf

DOI :

10.1109/MICRO.2014.29

Filename :

7011422

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1799913