• DocumentCode
    623716
  • Title

    RapidRAID: Pipelined erasure codes for fast data archival in distributed storage systems

  • Author

    Pamies-Juarez, Lluis ; Datta, Amitava ; Oggier, Frederique

  • Author_Institution
    Nanyang Technol. Univ., Singapore, Singapore
  • fYear
    2013
  • fDate
    14-19 April 2013
  • Firstpage
    1294
  • Lastpage
    1302
  • Abstract
    To achieve reliability in distributed storage systems, data has usually been replicated across different nodes. However the increasing volume of data to be stored has motivated the introduction of erasure codes, a storage efficient alternative to replication, particularly suited for archival in data centers, where old datasets (rarely accessed) can be erasure encoded, while replicas are maintained only for the latest data. Many recent works consider the design of new storage-centric erasure codes for improved repairability. In contrast, this paper addresses the migration from replication to encoding: traditionally erasure coding is an atomic operation in that a single node with the whole object encodes and uploads all the encoded pieces. Although large datasets can be concurrently archived by distributing individual object encodings among different nodes, the network and computing capacity of individual nodes constrain the archival process due to such atomicity. We propose a new pipelined coding strategy that distributes the network and computing load of single-object encodings among different nodes, which also speeds up multiple object archival. We further present RapidRAID codes, an explicit family of pipelined erasure codes which provides fast archival without compromising either data reliability or storage overheads. Finally, we provide a real implementation of RapidRAID codes and benchmark its performance using both a cluster of 50 nodes and a set of Amazon EC2 instances. Experiments show that RapidRAID codes reduce a single object´s coding time by up to 90%, while when multiple objects are encoded concurrently, the reduction is up to 20%.
  • Keywords
    computer centres; distributed databases; forward error correction; pipeline processing; Amazon EC2 instances; RapidRAID codes; atomic operation; data centers; data reliability; distributed storage systems; fast data archival; pipelined coding strategy; pipelined erasure codes; single-object encodings; storage overheads; storage-centric erasure codes; Distributed databases; Encoding; Fault tolerant systems; Pipelines; Redundancy; archival; distributed storage; erasure codes; migration;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    INFOCOM, 2013 Proceedings IEEE
  • Conference_Location
    Turin
  • ISSN
    0743-166X
  • Print_ISBN
    978-1-4673-5944-3
  • Type

    conf

  • DOI
    10.1109/INFCOM.2013.6566922
  • Filename
    6566922