DocumentCode :
2901787
Title :
Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors
Author :
Gold, Brian T. ; Falsafi, Babak ; Hoe, James C.
Author_Institution :
Comput. Archit. Lab. (CALCM), Carnegie Mellon Univ., Pittsburgh, PA, USA
fYear :
2009
fDate :
16-18 Nov. 2009
Firstpage :
195
Lastpage :
201
Abstract :
Distributed shared-memory (DSM) multiprocessors provide a scalable hardware platform, but lack the necessary redundancy for mainframe-level reliability and availability. Chip-level redundancy in a DSM server faces a key challenge: the increased latency to check results among redundant components. To address performance overheads, we propose a checking filter that reduces the number of checking operations impeding the critical path of execution. Furthermore, we propose to decouple checking operations from the coherence protocol, which simplifies the implementation and permits reuse of existing coherence controller hardware. Our simulation results of commercial workloads indicate average performance overhead is within 4% (9% maximum) of tightly coupled DMR solutions.
Keywords :
distributed shared memory systems; multiprocessing systems; program verification; checking filter; chip-level redundancy; coherence controller hardware; coherence protocol; distributed shared-memory multiprocessors; scalable hardware platform; Availability; Circuit faults; Clocks; Computer architecture; Delay; Filters; Hardware; Protection; Protocols; Redundancy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Dependable Computing, 2009. PRDC '09. 15th IEEE Pacific Rim International Symposium on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3849-5
Type :
conf
DOI :
10.1109/PRDC.2009.39
Filename :
5368542
Link To Document :
بازگشت