Title :
PSG-Codes: An Erasure Codes Family with High Fault Tolerance and Fast Recovery
Author :
Shiyi Li;Cao Qiang;Lei Tian;Shenggang Wan;Lu Qian;Changsheng Xie
Author_Institution :
Wuhan Nat. Lab. for Optoelectron., Huazhong Univ. of Sci. &
Abstract :
As hard disk failure rates are rarely improved and the reconstruction time for TB-level disks typically amounts to days, multiple concurrent disk/storage node failures in datacenter storage systems become common and frequent. As a result, the erasure coding schemes used in datacenters must meet the critical requirements of high fault tolerance, high storage efficiency, and fast fault recovery. In this paper, we introduce a new XOR-based non-MDS erasure code family with an ability of tolerating up to 12-disk/node failures, called PSG-Codes. The basic idea behind PSG-Codes is to partition disks into groups, and exploit short parity chains to generate parity units. Then, the parity chain is further shortened by varying the number of parity elements for each strip. We conduct a simulation-based study to search configuration parameter space of PSG-Codes, and prove that PSG-Codes can tolerate up to 12 disk/node failures. Compared with a well-known XOR-based non-MDS code, WEAVER codes, PSG-Codes have higher storage efficiency and lower reconstruction cost. Moreover, the storage efficiency and performance of PSG-Codes are also competitive with another stat-of-the-art GF-based non-MDS codes, LRC codes.
Keywords :
"Fault tolerance","Fault tolerant systems","Strips","Encoding","Complexity theory","Acceleration"
Conference_Titel :
Reliable Distributed Systems (SRDS), 2015 IEEE 34th Symposium on
Electronic_ISBN :
1060-9857
DOI :
10.1109/SRDS.2015.39