Title :
A mechanism to verify cache coherence transactions in multicore systems
Author :
Rodrigues, Rance ; Koren, Israel ; Kundu, Sandip
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Massachusetts, Amherst, MA, USA
Abstract :
The functional correctness of shared memory applications executing on multicores and multiprocessor systems is supported by cache coherence protocols. The correct operation of these applications thus depends on the correctness of the cache coherence transactions. However, verifying the correctness of these transactions is not trivial since even simple coherence protocols have multiple states. Transitions among the states can fail due to aging of devices or single event upsets. In this paper we present a centralized mechanism for online verification of cache coherence transactions in snoopy bus multicore systems. We make use of an architecture that we previously proposed for opportunistic Dual Modular Redundancy (DMR). This architecture includes, in addition to the general-purpose cores, a diminutive core called the Sentry Core (SC) that is small and simple and thus, can be assumed to be fault-free. Like other cores, the SC has access to the shared bus and is aware of the cache coherence protocol. It monitors all bus transactions and by observing the current state of the cache line being addressed and the type of operation (e.g., read or write) it knows the expected next state for that cache line. Deviation from expected behavior will indicate a possibe error. Our preliminary experiments show that a significant fraction of the coherence transactions can be verified by our scheme.
Keywords :
cache storage; fault tolerant computing; multiprocessing systems; performance evaluation; protocols; shared memory systems; transaction processing; DMR; SC; bus transactions; cache coherence protocols; cache coherence transactions; cache line current state; dual modular redundancy; fault-free; functional correctness; multiprocessor systems; sentry core; shared bus; shared memory applications; snoopy bus multicore systems; Benchmark testing; Coherence; Fault tolerance; Hardware; Monitoring; Multicore processing; Protocols; Online error detection; cache coherence; centralized mechanism; verification;
Conference_Titel :
Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), 2012 IEEE International Symposium on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4673-3043-5
DOI :
10.1109/DFT.2012.6378226