DocumentCode :
3309339
Title :
Evaluation of distributed recovery in large-scale storage systems
Author :
Xin, Qin ; Miller, Ethan L. ; Schwarz, Thomas J E.S.J
Author_Institution :
Storage Syst. Res. Center, California Univ., Santa Cruz, CA, USA
fYear :
2004
fDate :
4-6 June 2004
Firstpage :
172
Lastpage :
181
Abstract :
Storage clusters consisting of thousands of disk drives are now being used both for their large capacity and high throughput. However, their reliability is far worse than that of smaller storage systems due to the increased number of storage nodes. RAID technology is no longer sufficient to guarantee the necessary high data reliability for such systems, because disk rebuild time lengthens as disk capacity grows. We present fast recovery mechanism (FARM), a distributed recovery approach that exploits excess disk capacity and reduces data recovery time. FARM works in concert with replication and erasure-coding redundancy schemes to dramatically lower the probability of data loss in large-scale storage systems. We have examined essential factors that influence system reliability, performance, and costs, such as failure detections, disk bandwidth usage for recovery, disk space utilization, disk drive replacement, and system scales, by simulating system behavior under disk failures. Our results show the reliability improvement from FARM and demonstrate the impacts of various factors on system reliability. Using our techniques, system designers will be better able to build multipetabyte storage systems with much higher reliability at lower cost than previously possible.
Keywords :
RAID; distributed processing; redundancy; system recovery; RAID technology; data recovery time; data reliability; distributed recovery approach; erasure-coding redundancy schemes; failure detection; fast recovery mechanism; large-scale storage system; system performance; system reliability; Application software; Bandwidth; Buildings; Computational modeling; Costs; Disk drives; Internet; Large-scale systems; Reliability engineering; Throughput;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High performance Distributed Computing, 2004. Proceedings. 13th IEEE International Symposium on
ISSN :
1082-8907
Print_ISBN :
0-7695-2175-4
Type :
conf
DOI :
10.1109/HPDC.2004.1323523
Filename :
1323523
Link To Document :
بازگشت