DocumentCode :
29203
Title :
On the Speedup of Recovery in Large-Scale Erasure-Coded Storage Systems
Author :
Yunfeng Zhu ; Lee, Patrick P. C. ; Yinlong Xu ; Yuchong Hu ; Liping Xiang
Author_Institution :
Univ. of Sci. & Technol. of China, Hefei, China
Volume :
25
Issue :
7
fYear :
2014
fDate :
Jul-14
Firstpage :
1830
Lastpage :
1840
Abstract :
Modern storage systems stripe redundant data across multiple nodes to provide availability guarantees against node failures. One form of data redundancy is based on XOR-based erasure codes, which use only XOR operations for encoding and decoding. In addition to tolerating failures, a storage system must also provide fast failure recovery to reduce the window of vulnerability. This work addresses the problem of speeding up the recovery of a single-node failure for general XOR-based erasure codes. We propose a replace recovery algorithm, which uses a hill-climbing technique to search for a fast recovery solution, such that the solution search can be completed within a short time period. We further extend the algorithm to adapt to the scenario where nodes have heterogeneous capabilities (e.g., processing power and transmission bandwidth). We implement our replace recovery algorithm atop a parallelized architecture to demonstrate its feasibility. We conduct experiments on a networked storage system testbed, and show that our replace recovery algorithm uses less recovery time than the conventional recovery approach.
Keywords :
fault tolerant computing; storage management; XOR operations; XOR-based erasure codes; availability guarantees; data redundancy; fast recovery solution; hill-climbing technique; large-scale erasure-coded storage systems; networked storage system testbed; node failures; parallelized architecture; replace recovery algorithm; single-node failure recovery; vulnerability window; Algorithm design and analysis; Distributed databases; Encoding; Equations; Generators; Mathematical model; Strips; XOR-coded storage system; recovery algorithm; single-node failure;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2013.244
Filename :
6613479
Link To Document :
بازگشت