• DocumentCode
    1799875
  • Title

    Arbitrary Modulus Indexing

  • Author

    Diamond, Jeffrey R. ; Fussell, Donald S. ; Keckler, Stephen W.

  • Author_Institution
    Univ. of Texas at Austin, Austin, TX, USA
  • fYear
    2014
  • fDate
    13-17 Dec. 2014
  • Firstpage
    140
  • Lastpage
    152
  • Abstract
    Modern high performance processors require memory systems that can provide access to data at a rate that is well matched to the processor´s computation rate. Common to such systems is the organization of memory into local high speed memory banks that can be accessed in parallel. Associative look up of values is made efficient through indexing instead of associative memories. These techniques lose effectiveness when data locations are not mapped uniformly to the banks or cache locations, leading to bottlenecks that arise from excess demand on a subset of locations. Address mapping is most easily performed by indexing the banks using a mod (2 N) indexing scheme, but such schemes interact poorly with the memory access patterns of many computations, making resource conflicts a significant memory system bottleneck. Previous work has assumed that prime moduli are the best choices to alleviate conflicts and has concentrated on finding efficient implementations for them. In this paper, we introduce a new scheme called Arbitrary Modulus Indexing (AMI) that can be implemented efficiently for all moduli, matching or improving the efficiency of the best existing schemes for primes while allowing great flexibility in choosing a modulus to optimize cost/performance trade-offs. We also demonstrate that, for a memory-intensive workload on a modern replay-style GPU architecture, prime moduli are not in general the best choices for memory bank and cache set mappings. Applying AMI to set of memory intensive benchmarks eliminates 98% of bank and set conflicts, resulting in an average speedup of 24% over an aggressive baseline system and a 64% average reduction in memory system replays at reasonable implementation cost.
  • Keywords
    cache storage; graphics processing units; indexing; AMI; address mapping; aggressive baseline system; arbitrary modulus indexing; cache locations; data locations; high performance processors; local high speed memory banks; memory access patterns; memory intensive benchmark; memory system bottleneck; memory-intensive workload; prime moduli; processor computation rate; replay-style GPU architecture; Adders; Arrays; Graphics processing units; Hardware; Indexing; GPU caches; fast division and modulus; index schemes; prime-banking; replay architectures;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on
  • Conference_Location
    Cambridge
  • ISSN
    1072-4451
  • Type

    conf

  • DOI
    10.1109/MICRO.2014.13
  • Filename
    7011384