• DocumentCode
    802130
  • Title

    Address-Value Delta (AVD) Prediction: A Hardware Technique for Efficiently Parallelizing Dependent Cache Misses

  • Author

    Mutlu, Onur ; Kim, Hyesoon ; Patt, Yale N.

  • Author_Institution
    Microsoft Res., Redmond, WA
  • Volume
    55
  • Issue
    12
  • fYear
    2006
  • Firstpage
    1491
  • Lastpage
    1508
  • Abstract
    While runahead execution is effective at parallelizing independent long-latency cache misses, it is unable to parallelize dependent long-latency cache misses. To overcome this limitation, this paper proposes a novel hardware technique, address-value delta (AVD) prediction. An AVD predictor keeps track of the address (pointer) load instructions for which the arithmetic difference (i.e., delta) between the effective address and the data value is stable. If such a load instruction incurs a long-latency cache miss during runahead execution, its data value is predicted by subtracting the stable delta from its effective address. This prediction enables the preexecution of dependent instructions, including load instructions that incur long-latency cache misses. We analyze why and for what kind of loads AVD prediction works and describe the design of an implementable AVD predictor. We also describe simple hardware and software optimizations that can significantly improve the benefits of AVD prediction and analyze the interaction of AVD prediction with runahead efficiency techniques and stream-based data prefetching. Our analysis shows that AVD prediction is complementary to these techniques. Our results show that augmenting a runahead processor with a simple, 16-entry AVD predictor improves the average execution time of a set of pointer-intensive applications by 14.3 percent (7.5 percent excluding benchmark health)
  • Keywords
    DRAM chips; cache storage; instruction sets; microprocessor chips; parallel architectures; address load instruction; address-value delta prediction; arithmetic difference; dependent cache miss parallelization; dependent instruction preexecution; hardware optimization; hardware technique; independent long-latency cache miss; load instruction; memory-level parallelism; pointer-intensive application; runahead execution; runahead processor; single data stream architecture; software optimization; stream-based data prefetching; Arithmetic; Delay; Energy consumption; Hardware; Microprocessors; Prefetching; Process design; Random access memory; Registers; Switches; Single data stream architectures; memory-level parallelism.; runahead execution; value prediction;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2006.191
  • Filename
    1717383