• DocumentCode
    260511
  • Title

    Quantifying and Optimizing the Impact of Victim Cache Line Selection in Manycore Systems

  • Author

    Kandemir, Mahmut ; Wei Ding ; Guttman, Diana

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
  • fYear
    2014
  • fDate
    9-11 Sept. 2014
  • Firstpage
    385
  • Lastpage
    394
  • Abstract
    In both architecture and software, the main goal of data locality-oriented optimizations has always been "minimizing the number of cache misses" (especially, costly last-level cache misses). However, this paper shows that other metrics such as the distance between the last-level cache and memory controller as well as the memory queuing latency can play an equally important role, as far as application performance is concerned. Focusing on a large set of multithreaded applications, we first show that the last-level cache "write backs" (memory writes due to displacement of a victim block from the last-level cache) can exhibit significant latencies as well as variances, and then make a case for "relaxing" the strict LRU policy to save (write back) cycles in both the on-chip network and memory queues. Specifically, we explore novel architecture-level schemes that optimize on-chip network latency, memory queuing latency or both, of the write back messages, by carefully selecting the victim block to write back at the time of cache replacement. Our extensive experimental evaluations using 15 multithreaded applications and a cycle-accurate simulation infrastructure clearly demonstrate that this tradeoffs (between cache hit rate and on-chip network/memory queuing latency) pays off in most of the cases, leading to about 12.2% execution time improvement and 14.9% energy savings, in our default 64-core system with 6 memory controllers.
  • Keywords
    cache storage; multi-threading; multiprocessing systems; LRU policy; architecture-level schemes; cycle-accurate simulation infrastructure; data locality-oriented optimizations; energy savings; last-level cache misses; manycore systems; memory controller; memory queuing latency; multithreaded applications; on-chip network; on-chip network latency optimization; victim cache line selection; Computational modeling; Context; Data models; Memory management; Multicore processing; Optimization; System-on-chip; manycore; memory queuing latency; network-on-chip latency; victim cache line selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Modelling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), 2014 IEEE 22nd International Symposium on
  • Conference_Location
    Paris
  • ISSN
    1526-7539
  • Type

    conf

  • DOI
    10.1109/MASCOTS.2014.54
  • Filename
    7033676