Title :
Substituting Disk Failure Avoidance for Redundancy in Wide Area Fault Tolerant Storage Systems
Author :
Brumgard, Christopher ; Beck, Micah
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA
Abstract :
The primary mechanism for overcoming faults in modern storage systems is to introduce redundancy in the form of replication and/or error correcting codes. The costs of such redundancy in hardware, system availability and overall complexity can be substantial, depending on the number and pattern of faults that are handled. In this paper, we describe a system that seeks to use disk failure avoidance to reduce the need for costly redundancy by using adaptive heuristics that predict such failures. While a number of predictive factors such as hard drive utilization rate, age, SMART errors, and model can be used, the initial work we present here focuses on SMART errors. Our approach can predict where near term disk failures are more likely to occur, enabling proactive movement/replication of at-risk data, thus maintaining data integrity and availability. Our strategy can reduce costs due to redundant storage without compromising these important requirements.
Keywords :
data integrity; disc drives; error correction codes; fault tolerant computing; hard discs; storage management; system recovery; adaptive heuristics; data availability; data integrity; disk failure avoidance; error correcting codes; modern storage systems; wide area fault tolerant storage systems; Data models; Educational institutions; IP networks; Reliability; Resource management; Sociology; Statistics; SMART errors; hard drives; logistical networking;
Conference_Titel :
Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-2893-7
DOI :
10.1109/ClusterW.2012.39