Title :
A cost-based heterogeneous recovery scheme for distributed storage systems with RAID-6 codes
Author :
Zhu, Yunfeng ; Lee, Patrick P C ; Xiang, Liping ; Xu, Yinlong ; Gao, Lingling
Author_Institution :
Sch. of Comput. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei, China
Abstract :
Modern distributed storage systems provide large-scale, fault-tolerant data storage. To reduce the probability of data unavailability, it is important to recover the lost data of any failed storage node efficiently. In practice, storage nodes are of heterogeneous types and have different transmission bandwidths. Thus, traditional recovery solutions that simply minimize the number of data blocks being read may no longer be optimal in a heterogeneous environment. We propose a cost-based heterogeneous recovery (CHR) algorithm for RAID-6-coded storage systems. We formulate the recovery problem as an optimization model in which storage nodes are associated with generic costs. We narrow down the solution space of the model to make it practically tractable, while still achieving the global optimal solution in most cases. We implement different recovery algorithms and conduct testbed experiments on a real networked storage system with heterogeneous storage devices. We show that our CHR algorithm reduces the total recovery time of existing recovery solutions in various scenarios.
Keywords :
distributed processing; fault tolerance; minimisation; storage management; CHR algorithm; RAID- 6-coded storage systems; cost-based heterogeneous recovery scheme; data block minimization; data loss recovery; data unavailability probability reduction; distributed storage systems; generic costs; global optimal solution; heterogeneous storage devices; large-scale fault-tolerant data storage; networked storage system; optimization model; solution space; storage node failure; testbed experiments; total recovery time reduction; transmission bandwidths; Bandwidth; Cloud computing; Encoding; Measurement; Optimization; Peer to peer computing; Reliability; RAID-6 codes; distributed storage system; experimentation; failure recovery; node heterogeneity;
Conference_Titel :
Dependable Systems and Networks (DSN), 2012 42nd Annual IEEE/IFIP International Conference on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4673-1624-8
Electronic_ISBN :
1530-0889
DOI :
10.1109/DSN.2012.6263934