• DocumentCode
    2550074
  • Title

    Analysis of shared memory misses and reference patterns

  • Author

    Rothman, Jeffrey B. ; Smith, Alan Jay

  • Author_Institution
    Lyris Technol. Inc., Berkeley, CA, USA
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    187
  • Lastpage
    198
  • Abstract
    Shared bus computer systems permit the relatively simple and efficient implementation of cache consistency algorithms, but the shared bus is a bottleneck which limits performance. False sharing can be an important source of unnecessary traffic for invalidation-based protocols, elimination of which can provide significant performance improvements. For many multiprocessor workloads, however, most misses are true sharing plus cold start misses. Regardless of the cause of cache misses, the largest fraction of bus traffic are words transferred between caches without being accessed, which we refer to as dead sharing. We establish here new methods for characterizing cache block reference patterns, and we measure how these patterns change with variation in workload and block size. Our results show that 42 percent of 64-byte cache blocks are invalidated before more than one word has been read from the block and that 58 percent of blocks that have been modified only have a single word modified before an invalidation to the block occurs. Approximately 50 percent of blocks written and subsequently read by other caches show no use of the newly written information before the block is again invalidated. In addition to our general analysis of reference patterns, we also present a detailed analysis of dead sharing for each shared memory multiprocessor program studied. We find that the worst 10 blocks (based on most total misses) from each of our traces contribute almost 50 percent of the false shearing misses and almost 20 percent of the true sharing misses (on average). A relatively simple restructuring of four of our workloads based on analysis of these 10 worst blocks leads to a 21 percent reduction in overall misses and a 15 percent reduction in execution time. Permitting the block size to vary (as could be accomplished with a sector cache) shows that bus traffic can be reduced by 88 percent (for 64-byte blocks) while also decreasing the miss ratio by 35 percent
  • Keywords
    cache storage; protocols; shared memory systems; cache block reference patterns; cache consistency algorithms; false sharing; invalidation-based protocols; multiprocessor workloads; reference patterns; shared bus computer systems; shared memory misses; Coherence; Computer science; Hardware; Interference; Multiprocessing systems; Pattern analysis; Protocols; Size measurement; Software performance; Sun;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Design, 2000. Proceedings. 2000 International Conference on
  • Conference_Location
    Austin, TX
  • ISSN
    1063-6404
  • Print_ISBN
    0-7695-0801-4
  • Type

    conf

  • DOI
    10.1109/ICCD.2000.878285
  • Filename
    878285