DocumentCode
170718
Title
Distributed data storage systems with opportunistic repair
Author
Aggarwal, Vaneet ; Chao Tian ; Vaishampayan, Vinay A. ; Chen, Y.-F.R.
Author_Institution
AT&T Labs.-Res., Florham Park, NJ, USA
fYear
2014
fDate
April 27 2014-May 2 2014
Firstpage
1833
Lastpage
1841
Abstract
The reliability of erasure-coded distributed storage systems, as measured by the mean time to data loss (MTTDL), depends on the repair bandwidth of the code. Repair-efficient codes provide reliability values several orders of magnitude better than conventional erasure codes. Current state of the art codes fix the number of helper nodes (nodes participating in repair) a priori. In practice, however, it is desirable to allow the number of helper nodes to be adaptively determined by the network traffic conditions. In this work, we propose an opportunistic repair framework to address this issue. It is shown that there exists a threshold on the storage overhead, below which such an opportunistic approach does not lose any efficiency from the optimal storage-repair-bandwidth tradeoff; i.e. it is possible to construct a code simultaneously optimal for different numbers of helper nodes. We further examine the benefits of such opportunistic codes, and derive the MTTDL improvement for two repair models: one with limited total repair bandwidth and the other with limited individual-node repair bandwidth. In both settings, we show orders of magnitude improvement in MTTDL. Finally, the proposed framework is examined in a network setting where a significant improvement in MTTDL is observed.
Keywords
storage area networks; storage management; telecommunication traffic; MTTDL; erasure-coded distributed data storage system reliability; helper nodes; limited individual-node repair bandwidth; limited total repair bandwidth; mean time-to-data loss; network traffic conditions; opportunistic codes; opportunistic repair framework; optimal storage-repair-bandwidth tradeoff; reliability values; repair models; repair-efficient codes; storage overhead; Bandwidth; Computers; Conferences; Loss measurement; Maintenance engineering; Peer-to-peer computing; Reliability;
fLanguage
English
Publisher
ieee
Conference_Titel
INFOCOM, 2014 Proceedings IEEE
Conference_Location
Toronto, ON
Type
conf
DOI
10.1109/INFOCOM.2014.6848122
Filename
6848122
Link To Document