• DocumentCode
    251759
  • Title

    A Study of Replica Reconstruction Schemes for Multi-rack HDFS Clusters

  • Author

    Higai, Asami ; Takefusa, Atsuko ; Nakada, Hidemoto ; Oguchi, Masato

  • Author_Institution
    Ochanomizu Univ., Tokyo, Japan
  • fYear
    2014
  • fDate
    8-11 Dec. 2014
  • Firstpage
    196
  • Lastpage
    203
  • Abstract
    Distributed file systems, which enable users to manage large amounts of data over multiple commodity computers, have attracted attention as a potential management and processing system for big data applications. The Hadoop Distributed File System (HDFS) is a widely used open source distributed file system. In the HDFS, multiple replicas are separately stored over the multiple data nodes for enhanced availability. When a data node failure is detected, replica reconstruction is performed. During this process, the access load of the other data nodes, which hold the lost data blocks, may increase, so that the overall performance of data processing over the distributed file system decreases. Therefore, an important issue is effective replica reconstruction in order to prevent such performance degradation. In addition, HDFS composed of multiple racks is needed to replicate the missing blocks on a different rack according to the HDFS replica placement policy, for the purpose of availability. We have to take into account network bandwidth and fault tolerance for such blocks which require data transfer between racks in the cluster. In this paper, we propose replica reconstruction schemes for a multi-rack HDFS cluster and evaluate the effectiveness of our proposed schemes in multi-rack cluster environments by simulation. In the proposed schemes, data transfer in a rack is performed based on a one-directional ring structure and inter-rack data transfer is performed in a round robin manner. We control streams between racks as giving the priority for the blocks which requires inter-rack transfer. The experiments show that the proposed schemes are effective for reduction of the execution time and improvement of the fault tolerance. We also confirm that the performance shows further improvement by controlling the number of streams between racks properly and the execution times of our proposed schemes show a 16% reduction in time required compared to that of the default scheme.
  • Keywords
    data handling; distributed databases; Hadoop distributed file system; data node failure; fault tolerance; interrack data transfer; multirack HDFS cluster; multirack cluster environment; one-directional ring structure; open source distributed file system; replica reconstruction scheme; Availability; Bandwidth; Data transfer; Distributed databases; Fault tolerance; Fault tolerant systems; Structural rings; HDFS; data management; distributed file system; replica reconstruction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Utility and Cloud Computing (UCC), 2014 IEEE/ACM 7th International Conference on
  • Conference_Location
    London
  • Type

    conf

  • DOI
    10.1109/UCC.2014.28
  • Filename
    7027495