Title :
Aggregation Decoding for Multi-failure Recovery in Erasure-Coded Storage
Author :
Jing Zhang ; Shanshan Li ; Xiangke Liao ; Xiaodong Liu
Author_Institution :
Sch. of Comput. Sci. & Technol., Nat. Univ. of Defence Technol., Changsha, China
Abstract :
Data reliability is a significant issue in large-scale storage systems. Erasure codes provide high data reliability via data recovery, which however generates a large amount of data transmission in the network. The bandwidth cost of transmitting the recovery needed data significantly impacts the performance of the located cluster. Existing work considers the single-failure as the most common failure pattern and mainly focuses on reducing the data transmission cost of single-failure recovery, which unfortunately fails to efficiently support multi-failure recovery. In this work, we first provide the Mean Time To Multi-Failure (MTTMF) metric based on Markov model to demonstrate the frequency and pattern of multi-failure in erasure-coded storage. We then propose to use Aggregation Decoding, which is a practical network topology-aware erasure coding scheme, for multi-failure recovery. To reduce redundant transmission in multi-failure recovery, we propose two joint methods, Aggregation Decoding based de-redundancy and merging based de-duplication. The analysis and experimental results demonstrate the importance of multi-failure recovery problem and the efficiency of our solution, which saves bandwidth cost by around 40% for different settings.
Keywords :
Markov processes; codes; decoding; merging; storage management; MTTMF metric; Markov model; aggregation decoding based deredundancy; bandwidth cost; data recovery; data reliability; data transmission cost reduction; erasure-coded storage; failure pattern; large-scale storage systems; mean time to multifailure metric; merging based deduplication; multifailure recovery problem; network topology-aware erasure coding scheme; redundant transmission reduction; single-failure recovery; Bandwidth; Decoding; Encoding; Markov processes; Redundancy; Topology; Aggregation Decoding; Erasure-Coded Storage; Multi-Failure Recovery;
Conference_Titel :
Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4799-7980-6
DOI :
10.1109/CSE.2014.253