مرکز منطقه ای اطلاع رساني علوم و فناوري - Revisiting Using the Results of Pre-Executed Instructions in Runahead Processors

DocumentCode :

46526

Title :

Revisiting Using the Results of Pre-Executed Instructions in Runahead Processors

Author :

Wolff, Sonya R. ; Barnes, Ronald D.

Volume :

Issue :

fYear :

2014

fDate :

July-Dec. 16 2014

Firstpage :

Lastpage :

100

Abstract :

Long-latency cache accesses cause significant performance-impacting delays for both in-order and out-of-order processor systems. To address these delays, runahead pre-execution has been shown to produce speedups by warming-up cache structures during stalls caused by long-latency memory accesses. While improving cache related performance, basic runahead approaches do not otherwise utilize results from accurately pre-executed instructions during normal operation. This simple model of execution is potentially inefficient and performance constraining. However, a previous study showed that exploiting the results of accurately pre-executed runahead instructions for out-of-order processors provide little performance improvement over simple re-execution. This work will show that, unlike out-of-order runahead architectures, the performance improvement from runahead result use for an in-order pipeline is more significant, on average, and in some situations provides dramatic performance improvements. For a set of SPEC CPU2006 benchmarks which experience performance improvement from basic runahead, the addition of result use to the pipeline provided an additional speedup of 1.14× (high - 1.48×) for an in-order processor model compared to only 1.05× (high - 1.16×) for an out-of-order one. When considering benchmarks with poor data cache locality, the average speedup increased to 1.21× for in-order compared to only 1.10× for out-of-order.

Keywords :

cache storage; multiprocessing systems; SPEC CPU2006 benchmarks; data cache locality; in-order processor systems; long-latency cache accesses; long-latency memory accesses; out-of-order processor systems; out-of-order runahead architectures; preexecuted runahead instructions; runahead processors; Benchmark testing; Hidden Markov models; Out of order; Pipeline processing; Registers; C.1.5.c Superscalar dynamically-scheduled and statically-scheduled implementation; C.1.5.e Memory hierarchy; Memory Wall; Pre-Execution; Runahead;

fLanguage :

English

Journal_Title :

Computer Architecture Letters

Publisher :

ieee

ISSN :

1556-6056

Type :

jour

DOI :

10.1109/L-CA.2013.21

Filename :

6562693

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=46526