Title :
The RAS Implications of DIMM Connector Failure Rates in Large, Highly Available Server Systems
Author :
Dell, Timothy J.
Author_Institution :
IBM Corp., Essex Junction
Abstract :
The juxtaposition of low-cost dual inline memory module (DIMM) connectors in highly reliable servers has created a difficult reliability, availability, and serviceability conundrum: the connector cost must be low enough to allow hundreds of sockets to be used per system, while at the same time, the system-level reliability must be high enough to prevent connector-related memory failures. This paper explores some of the modeling techniques that can be used to guide system-level fault tolerance decisions in view of the propensity of card-edge connectors to experience corrosion-induced failures, and it explains why understanding the probability density function (PDF) of the connector failure rate is crucial in establishing the system RAS strategy for DIMM connectors. The effects of both a "low" and "high" contact failure rate are analyzed at two different PDF\´s, and the resultant system implications are discussed.
Keywords :
fault tolerance; network servers; DIMM connector failure rates; RAS implications; card-edge connectors; corrosion-induced failures; dual inline memory module connectors; probability density function; server systems; system-level fault tolerance decisions; system-level reliability; Availability; Bit error rate; Connectors; Costs; Cyclic redundancy check; Data communication; Error correction codes; Failure analysis; Fault tolerant systems; Sockets;
Conference_Titel :
Electrical contacts - 2007, the 53rd ieee holm conference on
Conference_Location :
Pittsburgh, PA
Print_ISBN :
1-4244-0837-7
Electronic_ISBN :
1-4244-0838-5
DOI :
10.1109/HOLM.2007.4318226