DocumentCode :
807539
Title :
Improved disk-drive failure warnings
Author :
Hughes, Gordon F. ; Murray, Joseph F. ; Kreutz-Delgado, Kenneth ; Elkan, Charles
Author_Institution :
Center for Magnetic Recording Res., California Univ., San Diego, La Jolla, CA, USA
Volume :
51
Issue :
3
fYear :
2002
fDate :
9/1/2002 12:00:00 AM
Firstpage :
350
Lastpage :
357
Abstract :
Improved methods are proposed for disk-drive failure prediction. The SMART (self monitoring and reporting technology) failure prediction system is currently implemented in disk-drives. Its purpose is to predict the near-term failure of an individual hard disk-drive, and issue a backup warning to prevent data loss. Two experimental tests of SMART show only moderate accuracy at low false-alarm rates. (A rate of 0.2% of total drives per year implies that 20% of drive returns would be good drives, relative to ≈1% annual failure rate of drives). This requirement for very low false-alarm rates is well known in medical diagnostic tests for rare diseases, and methodology used there suggests ways to improve SMART. Two improved SMART algorithms are proposed. They use the SMART internal drive attribute measurements in present drives. The present warning-algorithm based on maximum error thresholds is replaced by distribution-free statistical hypothesis tests. These improved algorithms are computationally simple enough to be implemented in drive microprocessor firmware code. They require only integer sort operations to put several hundred attribute values in rank order. Some tens of these ranks are added up and the SMART warning is issued if the sum exceeds a prestored limit. These new algorithms were tested on 3744 drives of 2 models. They gave 3-4 times higher correct prediction accuracy than error thresholds on will-fail drives, at 0.2% false-alarm rate. The highest accuracies achievable are modest (40%-60%). Care was taken to test will-fail drive prediction accuracy on data independent of the algorithm design data. Additional work is needed to verify and apply these algorithms in actual drive design. They can also be useful in drive failure analysis engineering. It might be possible to screen drives in manufacturing using SMART attributes. Marginal drives might be detected before substantial final test time is invested in them, thereby decreasing manufacturing cost, and possibly decreasing overall field failure rates
Keywords :
disc drives; failure analysis; magnetic recording; SMART; SMART failure prediction system; data loss prevention; disk drive; distribution-free statistical hypothesis tests; drive failure analysis engineering; drive microprocessor firmware code; failure prediction; integer sort operations; internal drive attribute measurements; magnetic recording; maximum error thresholds; medical diagnostic tests; near-term failure prediction; predictive failure analysis; rare diseases; self monitoring and reporting technology; very low false-alarm rates; warning-algorithm; will-fail drive prediction accuracy testing; Accuracy; Algorithm design and analysis; Condition monitoring; Diseases; Manufacturing; Medical diagnosis; Medical tests; Microprocessors; Microprogramming; Testing;
fLanguage :
English
Journal_Title :
Reliability, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9529
Type :
jour
DOI :
10.1109/TR.2002.802886
Filename :
1028408
Link To Document :
بازگشت