• DocumentCode
    177308
  • Title

    Fractal++: Closing the performance gap between fractal and conventional coherence

  • Author

    Voskuilen, Gwendolyn ; Vijaykumar, T.N.

  • Author_Institution
    Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA
  • fYear
    2014
  • fDate
    14-18 June 2014
  • Firstpage
    409
  • Lastpage
    420
  • Abstract
    Cache coherence protocol bugs can cause multicores to fail. Existing coherence verification approaches incur state explosion at small scales or require considerable human effort. As protocols´ complexity and multicores´ core counts increase, verification continues to be a challenge. Recently, researchers proposed fractal coherence which achieves scalable verification by enforcing observational equivalence between sub-systems in the coherence protocol. A larger sub-system is verified implicitly if a smaller sub-system has been verified. Unfortunately, fractal protocols suffer from two fundamental limitations: (1) indirect-communication: sub-systems cannot directly communicate and (2) partially-serial-invalidations: cores must be invalidated in a specific, serial order. These limitations disallow common performance optimizations used by conventional directory protocols: reply-forwarding where caches communicate directly and parallel invalidations. Therefore, fractal protocols lack performance scalability while directory protocols lack verification scalability. To enable both performance and verification scalability, we propose Fractal++ which employs a new class of protocol optimizations for verification-constrained architectures: decoupled-replies, contention-hints, and fully-parallel-fractal-invalidations. The first two optimizations allow reply-forwarding-like performance while the third optimization enables parallel invalidations in fractal protocols. Unlike conventional protocols, Fractal++ preserves observational equivalence and hence is scalably verifiable. In 32-core simulations of single- and four-socket systems, Fractal++ performs nearly as well as a directory protocol while providing scalable verifiability whereas the best-performing previous fractal protocol performs 8% on average and up to 26% worse with a single-socket and 12% on average and up to 34% worse with a longer-latency multi-socket system.
  • Keywords
    cache storage; formal verification; parallel processing; 32-core simulations; Fractal++; cache coherence protocol bugs; coherence verification approaches; contention-hints; decoupled-replies; directory protocols; four-socket system; fractal coherence; fractal protocols; fully-parallel-fractal-invalidations; indirect-communication; longer-latency multisocket system; multicores; observational equivalence; parallel invalidations; partially-serial-invalidations; performance gap; performance optimizations; performance scalability; protocol optimizations; reply-forwarding; single-socket system; state explosion; verification scalability; verification-constrained architectures; Coherence; Erbium; Fractals; Multicore processing; Optimization; Protocols; Scalability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on
  • Conference_Location
    Minneapolis, MN
  • Print_ISBN
    978-1-4799-4396-8
  • Type

    conf

  • DOI
    10.1109/ISCA.2014.6853211
  • Filename
    6853211