Title :
Evaluating the impact of Undetected Disk Errors in RAID systems
Author :
Rozier, Eric W D ; Belluomini, Wendy ; Deenadhayalan, Veera ; Hafner, Jim ; Rao, K.K. ; Zhou, Pin
Author_Institution :
IBM Almaden Res. Center, San Jose, CA, USA
fDate :
June 29 2009-July 2 2009
Abstract :
Despite the reliability of modern disks, recent studies have made it clear that a new class of faults, UndetectedDisk Errors (UDEs) also known as silent data corruption events, become a real challenge as storage capacity scales. While RAID systems have proven effective in protecting data from traditional disk failures, silent data corruption events remain a significant problem unaddressed by RAID. We present a fault model for UDEs, and a hybrid framework for simulating UDEs in large-scale systems. The framework combines a multi-resolution discrete event simulator with numerical solvers. Our implementation enables us to model arbitrary storage systems and workloads and estimate the rate of undetected data corruptions. We present results for several systems and workloads, from gigascale to petascale. These results indicate that corruption from UDEs is a significant problem in the absence of protection schemes and that such schemes dramatically decrease the rate of undetected data corruption.
Keywords :
RAID; discrete event simulation; system recovery; RAID system; arbitrary storage system; disk reliability; fault model; large-scale system; multiresolution discrete event simulator; silent data corruption event; storage capacity; undetected disk error; Analytical models; Computer errors; Discrete event simulation; Event detection; Fault detection; Large-scale systems; Numerical simulation; Protection; Switches; System testing; modeling; silent data corruption; simulation; undetected disk errors;
Conference_Titel :
Dependable Systems & Networks, 2009. DSN '09. IEEE/IFIP International Conference on
Conference_Location :
Lisbon
Print_ISBN :
978-1-4244-4422-9
Electronic_ISBN :
978-1-4244-4421-2
DOI :
10.1109/DSN.2009.5270353