DocumentCode :
7096
Title :
Single Disk Failure Recovery for X-Code-Based Parallel Storage Systems
Author :
Silei Xu ; Runhui Li ; Lee, Patrick P. C. ; Yunfeng Zhu ; Liping Xiang ; Yinlong Xu ; Lui, John C. S.
Author_Institution :
Sch. of Comput. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei, China
Volume :
63
Issue :
4
fYear :
2014
fDate :
Apr-14
Firstpage :
995
Lastpage :
1007
Abstract :
In modern parallel storage systems (e.g., cloud storage and data centers), it is important to provide data availability guarantees against disk (or storage node) failures via redundancy coding schemes. One coding scheme is X-code, which is double-fault tolerant while achieving the optimal update complexity. When a disk/node fails, recovery must be carried out to reduce the possibility of data unavailability. We propose an X-code-based optimal recovery scheme called minimum-disk-read-recovery (MDRR), which minimizes the number of disk reads for single-disk failure recovery. We make several contributions. First, we show that MDRR provides optimal single-disk failure recovery and reduces about 25 percent of disk reads compared to the conventional recovery approach. Second, we prove that any optimal recovery scheme for X-code cannot balance disk reads among different disks within a single stripe in general cases. Third, we propose an efficient logical encoding scheme that issues balanced disk read in a group of stripes for any recovery algorithm (including the MDRR scheme). Finally, we implement our proposed recovery schemes and conduct extensive testbed experiments in a networked storage system prototype. Experiments indicate that MDRR reduces around 20 percent of recovery time of the conventional approach, showing that our theoretical findings are applicable in practice.
Keywords :
disc storage; encoding; parallel memories; redundancy; reliability; storage management; system recovery; MDRR; X-code-based optimal recovery scheme; X-code-based parallel storage systems; cloud storage; data availability; data centers; double-fault tolerant coding scheme; logical encoding scheme; minimum-disk-read-recovery; networked storage system prototype; optimal single-disk failure recovery; optimal update complexity; redundancy coding schemes; single disk failure recovery algorithm; Arrays; Complexity theory; Data communication; Encoding; Load management; Peer to peer computing; Reliability; Parallel storage systems; coding theory; data availability; recovery algorithm;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/TC.2013.8
Filename :
6409832
Link To Document :
بازگشت