مرکز منطقه ای اطلاع رساني علوم و فناوري - Exploiting Decoding Computational Locality to Improve the I/O Performance of an XOR-Coded Storage Cluster under Concurrent Failures

DocumentCode :

187027

Title :

Exploiting Decoding Computational Locality to Improve the I/O Performance of an XOR-Coded Storage Cluster under Concurrent Failures

Author :

Shiyi Li ; Xubin He ; Shenggang Wan ; Yuhua Guo ; Ping Huang ; Di Chen ; Qiang Cao ; Changsheng Xie

Author_Institution :

Wuhan Nat. Lab. for Optoelectron., Huazhong Univ. of Sci. & Technol. Wuhan, Wuhan, China

fYear :

2014

fDate :

6-9 Oct. 2014

Firstpage :

125

Lastpage :

135

Abstract :

In today´s large data centers, hundreds to thousands of nodes are deployed as storage clusters to provide cloud and big data storage service, where failures are not rare. Therefore, efficient data redundancy technologies are needed to ensure data availability and reliability. Compared to traditional technology based on replication, erasure codes which tolerate multiple failures provide availability and reliability at a much lower cost. However, those erasure-coded, particularly XOR-coded storage clusters, suffer from performance problem caused by degraded reads under concurrent node failures. With the traditional centralized decoding method, a large amount of extra data has to be transmitted over the network to service degraded reads. In particular, the degraded reads in XOR-coded stripes with concurrent failures result in notably high network traffic. To address this problem, we propose a novel decoding approach called Local Decoding First or LDF for short. Via exploiting decoding computational locality of XOR-coded storage clusters, LDF significantly reduces the required network traffic and hence reduces the access latency of degraded reads, thus improving I/O throughput. A prototype of LDF with two typical XOR codes has been implemented in the popular distributed file system HDFS on a storage cluster composed of 40 nodes. The experimental results show that LDF dramatically reduces the network traffic under concurrent node failures and thus improves both the I/O throughput and access latency.

Keywords :

Big Data; cloud computing; concurrency (computers); replicated databases; storage management; Big Data storage service; HDFS; I/O performance; I/O throughput; LDF; XOR-coded storage clusters; XOR-coded stripes; access latency; centralized decoding method; cloud storage service; concurrent node failures; data availability; data redundancy technologies; data reliability; decoding approach; decoding computational locality; degraded reads; distributed file system; erasure codes; large data centers; local decoding first; network traffic; replication; Availability; Data communication; Decoding; Equations; Silicon; Strips; Throughput; distributed systems; erasure codes; reliability; storage clusters;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Reliable Distributed Systems (SRDS), 2014 IEEE 33rd International Symposium on

Conference_Location :

Nara

Type :

conf

DOI :

10.1109/SRDS.2014.36

Filename :

6983387

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=187027