Title :
Self-repairing homomorphic codes for distributed storage systems
Author :
Oggier, Frédérique ; Datta, Anwitaman
Author_Institution :
Div. of Math. Sci., Nanyang Technol. Univ., Singapore, Singapore
Abstract :
Erasure codes provide a storage efficient alternative to replication based redundancy in (networked) storage systems. They however entail high communication overhead for maintenance, when some of the encoded fragments are lost and need to be replenished. Such overheads arise from the fundamental need to recreate (or keep separately) first a copy of the whole object before any individual encoded fragment can be generated and replenished. There has recently been intense interest to explore alternatives, most prominent ones being regenerating codes (RGC) and hierarchical codes (HC). We propose as an alternative a new family of codes to improve the maintenance process, called self-repairing codes (SRC), with the following salient features: (a) encoded fragments can be repaired directly from other subsets of encoded fragments by downloading less data than the size of the complete object, ensuring that (b) a fragment is repaired from a fixed number of encoded fragments, the number depending only on how many encoded blocks are missing and independent of which specific blocks are missing. These properties allow for not only low communication overhead to recreate a missing fragment, but also independent reconstruction of different missing fragments in parallel, possibly in different parts of the network. The fundamental difference between SRCs and HCs is that different encoded fragments in HCs do not have symmetric roles (equal importance). Consequently the number of fragments required to replenish a specific fragment in HCs depends on which specific fragments are missing, and not solely on how many. Likewise, object reconstruction may need different number of fragments depending on which fragments are missing. RGCs apply network coding over (n, k) erasure codes, and provide network information flow based limits on the minimal maintenance overheads. RGCs need to communicate with at least k other nodes to recreate any fragment, and the minimal overhead is achieved if only one- - fragment is missing, and information is downloaded from all the other n-1 nodes. We analyze the static resilience of SRCs with respect to erasure codes, and observe that SRCs incur marginally larger storage overhead in order to achieve the aforementioned properties. The salient SRC properties naturally translate to low communication overheads for reconstruction of lost fragments, and allow reconstruction with lower latency by facilitating repairs in parallel. These desirable properties make SRC a practical candidate for networked distributed storage systems.
Keywords :
codes; storage area networks; distributed storage systems; erasure codes; hierarchical codes; low communication overhead; redundancy; regenerating codes; self-repairing homomorphic codes; Decoding; Encoding; Maintenance engineering; Peer to peer computing; Polynomials; Redundancy; Resilience; coding; networked storage; self-repair;
Conference_Titel :
INFOCOM, 2011 Proceedings IEEE
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-9919-9
DOI :
10.1109/INFCOM.2011.5934901