A Study of Replica Reconstruction Schemes for Multi-rack HDFS Clusters

Author

Higai, Asami ; Takefusa, Atsuko ; Nakada, Hidemoto ; Oguchi, Masato

Author_Institution

Ochanomizu Univ., Tokyo, Japan

fYear

2014

fDate

8-11 Dec. 2014

Firstpage

196

Lastpage

203

Abstract

Distributed file systems, which enable users to manage large amounts of data over multiple commodity computers, have attracted attention as a potential management and processing system for big data applications. The Hadoop Distributed File System (HDFS) is a widely used open source distributed file system. In the HDFS, multiple replicas are separately stored over the multiple data nodes for enhanced availability. When a data node failure is detected, replica reconstruction is performed. During this process, the access load of the other data nodes, which hold the lost data blocks, may increase, so that the overall performance of data processing over the distributed file system decreases. Therefore, an important issue is effective replica reconstruction in order to prevent such performance degradation. In addition, HDFS composed of multiple racks is needed to replicate the missing blocks on a different rack according to the HDFS replica placement policy, for the purpose of availability. We have to take into account network bandwidth and fault tolerance for such blocks which require data transfer between racks in the cluster. In this paper, we propose replica reconstruction schemes for a multi-rack HDFS cluster and evaluate the effectiveness of our proposed schemes in multi-rack cluster environments by simulation. In the proposed schemes, data transfer in a rack is performed based on a one-directional ring structure and inter-rack data transfer is performed in a round robin manner. We control streams between racks as giving the priority for the blocks which requires inter-rack transfer. The experiments show that the proposed schemes are effective for reduction of the execution time and improvement of the fault tolerance. We also confirm that the performance shows further improvement by controlling the number of streams between racks properly and the execution times of our proposed schemes show a 16% reduction in time required compared to that of the default scheme.

Keywords

data handling; distributed databases; Hadoop distributed file system; data node failure; fault tolerance; interrack data transfer; multirack HDFS cluster; multirack cluster environment; one-directional ring structure; open source distributed file system; replica reconstruction scheme; Availability; Bandwidth; Data transfer; Distributed databases; Fault tolerance; Fault tolerant systems; Structural rings; HDFS; data management; distributed file system; replica reconstruction;

fLanguage

English

Publisher

ieee

Conference_Titel

Utility and Cloud Computing (UCC), 2014 IEEE/ACM 7th International Conference on

Conference_Location

London

Type

conf

DOI

10.1109/UCC.2014.28

Filename

7027495

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=251759