• DocumentCode
    2846303
  • Title

    Taming Single-Thread Program Performance on Many Distributed On-Chip L2 Caches

  • Author

    Jin, Lei ; Cho, Sangyeun

  • Author_Institution
    Dept. of Comput. Sci., Pittsburgh Univ., Pittsburgh, PA
  • fYear
    2008
  • fDate
    9-12 Sept. 2008
  • Firstpage
    487
  • Lastpage
    494
  • Abstract
    This paper presents a two-part study on managing distributed NUCA (non-uniform cache architecture) L2caches in a future many core processor to obtain high single thread program performance. The first part of our study is a limit study where we determine data to cache slice mappings at the memory page granularity based on detailed inter-page conflict information derived from program\´s memory reference trace. By considering cache access latency and cache miss rate simultaneously when mapping data to L2 cache slices, this "oracle" scheme outperforms the conventional shared caching scheme by up to 208% with an average of 45% on a sixteen-core processor. In the second part of the study, we propose and evaluate a dynamic cache management scheme that determines the home cache slice and cache bin for memory pages without any static program information. The dynamic scheme outperforms the shared caching scheme by up to 191% with an average of 32%, achieving much of the performance we observed in the limit study. We also find that the proposed dynamic scheme adapts to multiprogrammed workloads\´ behavior well and performs significantly better than both the private caching scheme and the shared caching scheme.
  • Keywords
    cache storage; distributed processing; multiprocessing systems; cache access latency; cache slice mappings; distributed onchip L2 caches; interpage conflict information; many core processor; memory page granularity; nonuniform cache architecture; shared caching scheme; single-thread program performance; Bandwidth; Computer architecture; Computer science; Delay; Memory management; Multiprocessor interconnection networks; Network-on-a-chip; Parallel processing; Simultaneous localization and mapping; Switches; multicore; performance; single thread;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing, 2008. ICPP '08. 37th International Conference on
  • Conference_Location
    Portland, OR
  • ISSN
    0190-3918
  • Print_ISBN
    978-0-7695-3374-2
  • Electronic_ISBN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2008.29
  • Filename
    4625885