• DocumentCode
    2923131
  • Title

    A cost-based heterogeneous recovery scheme for distributed storage systems with RAID-6 codes

  • Author

    Zhu, Yunfeng ; Lee, Patrick P C ; Xiang, Liping ; Xu, Yinlong ; Gao, Lingling

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei, China
  • fYear
    2012
  • fDate
    25-28 June 2012
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    Modern distributed storage systems provide large-scale, fault-tolerant data storage. To reduce the probability of data unavailability, it is important to recover the lost data of any failed storage node efficiently. In practice, storage nodes are of heterogeneous types and have different transmission bandwidths. Thus, traditional recovery solutions that simply minimize the number of data blocks being read may no longer be optimal in a heterogeneous environment. We propose a cost-based heterogeneous recovery (CHR) algorithm for RAID-6-coded storage systems. We formulate the recovery problem as an optimization model in which storage nodes are associated with generic costs. We narrow down the solution space of the model to make it practically tractable, while still achieving the global optimal solution in most cases. We implement different recovery algorithms and conduct testbed experiments on a real networked storage system with heterogeneous storage devices. We show that our CHR algorithm reduces the total recovery time of existing recovery solutions in various scenarios.
  • Keywords
    distributed processing; fault tolerance; minimisation; storage management; CHR algorithm; RAID- 6-coded storage systems; cost-based heterogeneous recovery scheme; data block minimization; data loss recovery; data unavailability probability reduction; distributed storage systems; generic costs; global optimal solution; heterogeneous storage devices; large-scale fault-tolerant data storage; networked storage system; optimization model; solution space; storage node failure; testbed experiments; total recovery time reduction; transmission bandwidths; Bandwidth; Cloud computing; Encoding; Measurement; Optimization; Peer to peer computing; Reliability; RAID-6 codes; distributed storage system; experimentation; failure recovery; node heterogeneity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Dependable Systems and Networks (DSN), 2012 42nd Annual IEEE/IFIP International Conference on
  • Conference_Location
    Boston, MA
  • ISSN
    1530-0889
  • Print_ISBN
    978-1-4673-1624-8
  • Electronic_ISBN
    1530-0889
  • Type

    conf

  • DOI
    10.1109/DSN.2012.6263934
  • Filename
    6263934