• DocumentCode
    170718
  • Title

    Distributed data storage systems with opportunistic repair

  • Author

    Aggarwal, Vaneet ; Chao Tian ; Vaishampayan, Vinay A. ; Chen, Y.-F.R.

  • Author_Institution
    AT&T Labs.-Res., Florham Park, NJ, USA
  • fYear
    2014
  • fDate
    April 27 2014-May 2 2014
  • Firstpage
    1833
  • Lastpage
    1841
  • Abstract
    The reliability of erasure-coded distributed storage systems, as measured by the mean time to data loss (MTTDL), depends on the repair bandwidth of the code. Repair-efficient codes provide reliability values several orders of magnitude better than conventional erasure codes. Current state of the art codes fix the number of helper nodes (nodes participating in repair) a priori. In practice, however, it is desirable to allow the number of helper nodes to be adaptively determined by the network traffic conditions. In this work, we propose an opportunistic repair framework to address this issue. It is shown that there exists a threshold on the storage overhead, below which such an opportunistic approach does not lose any efficiency from the optimal storage-repair-bandwidth tradeoff; i.e. it is possible to construct a code simultaneously optimal for different numbers of helper nodes. We further examine the benefits of such opportunistic codes, and derive the MTTDL improvement for two repair models: one with limited total repair bandwidth and the other with limited individual-node repair bandwidth. In both settings, we show orders of magnitude improvement in MTTDL. Finally, the proposed framework is examined in a network setting where a significant improvement in MTTDL is observed.
  • Keywords
    storage area networks; storage management; telecommunication traffic; MTTDL; erasure-coded distributed data storage system reliability; helper nodes; limited individual-node repair bandwidth; limited total repair bandwidth; mean time-to-data loss; network traffic conditions; opportunistic codes; opportunistic repair framework; optimal storage-repair-bandwidth tradeoff; reliability values; repair models; repair-efficient codes; storage overhead; Bandwidth; Computers; Conferences; Loss measurement; Maintenance engineering; Peer-to-peer computing; Reliability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    INFOCOM, 2014 Proceedings IEEE
  • Conference_Location
    Toronto, ON
  • Type

    conf

  • DOI
    10.1109/INFOCOM.2014.6848122
  • Filename
    6848122