• DocumentCode
    2475613
  • Title

    A mechanism to verify cache coherence transactions in multicore systems

  • Author

    Rodrigues, Rance ; Koren, Israel ; Kundu, Sandip

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Massachusetts, Amherst, MA, USA
  • fYear
    2012
  • fDate
    3-5 Oct. 2012
  • Firstpage
    211
  • Lastpage
    216
  • Abstract
    The functional correctness of shared memory applications executing on multicores and multiprocessor systems is supported by cache coherence protocols. The correct operation of these applications thus depends on the correctness of the cache coherence transactions. However, verifying the correctness of these transactions is not trivial since even simple coherence protocols have multiple states. Transitions among the states can fail due to aging of devices or single event upsets. In this paper we present a centralized mechanism for online verification of cache coherence transactions in snoopy bus multicore systems. We make use of an architecture that we previously proposed for opportunistic Dual Modular Redundancy (DMR). This architecture includes, in addition to the general-purpose cores, a diminutive core called the Sentry Core (SC) that is small and simple and thus, can be assumed to be fault-free. Like other cores, the SC has access to the shared bus and is aware of the cache coherence protocol. It monitors all bus transactions and by observing the current state of the cache line being addressed and the type of operation (e.g., read or write) it knows the expected next state for that cache line. Deviation from expected behavior will indicate a possibe error. Our preliminary experiments show that a significant fraction of the coherence transactions can be verified by our scheme.
  • Keywords
    cache storage; fault tolerant computing; multiprocessing systems; performance evaluation; protocols; shared memory systems; transaction processing; DMR; SC; bus transactions; cache coherence protocols; cache coherence transactions; cache line current state; dual modular redundancy; fault-free; functional correctness; multiprocessor systems; sentry core; shared bus; shared memory applications; snoopy bus multicore systems; Benchmark testing; Coherence; Fault tolerance; Hardware; Monitoring; Multicore processing; Protocols; Online error detection; cache coherence; centralized mechanism; verification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), 2012 IEEE International Symposium on
  • Conference_Location
    Austin, TX
  • Print_ISBN
    978-1-4673-3043-5
  • Type

    conf

  • DOI
    10.1109/DFT.2012.6378226
  • Filename
    6378226