• DocumentCode
    3256412
  • Title

    Aurora: Adaptive Block Replication in Distributed File Systems

  • Author

    Qi Zhang ; Sai Qian Zhang ; Leon-Garcia, Alberto ; Boutaba, Raouf

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Toronto, Toronto, ON, Canada
  • fYear
    2015
  • fDate
    June 29 2015-July 2 2015
  • Firstpage
    442
  • Lastpage
    451
  • Abstract
    Distributed file systems such as Google File System and Hadoop Distributed File System have been used to store large volumes of data in Cloud data centers. These systems divide data sets in blocks of fixed size and replicate them over multiple machines to achieve both reliability and efficiency. Recent studies have shown that data blocks tend to have a wide disparity in data popularity. In this context, the naive block replication schemes used by these systems often cause an uneven load distribution across machines, which reduces the overall I/O throughput of the system. While many replication algorithms have been proposed, existing solutions have not carefully studied the placement of data blocks that balances the load across machines, while ensuring node and rack-level reliability requirements are satisfied. In this paper, we study the dynamic data replication problem with the goal of balancing machine load while ensuring machine and rack-level reliability requirements are met. We propose several local search algorithms that provide constant approximation guarantees, yet simple and practical for implementation. We further present Aurora, a dynamic block placement mechanism that implements these algorithms in the Hadoop Distributed File System with minimal overhead. Through experiments using workload traces from Yahoo! and Facebook, we show Aurora reduces machine load imbalance by up to 26.9% compared to existing solutions, while satisfying node and rack-level reliability requirements.
  • Keywords
    cloud computing; computer centres; data handling; distributed databases; network operating systems; parallel processing; reliability; search problems; Aurora; Facebook; Google File System; Hadoop distributed file system; I/O throughput; Naive block replication schemes; Yahoo!; adaptive block replication; cloud data centers; data block placement; dynamic block placement mechanism; dynamic data replication problem; load distribution; local search algorithms; machine load balancing; machine-level reliability requirements; rack-level reliability requirements; Approximation algorithms; Clustering algorithms; Distributed databases; Fault tolerance; Fault tolerant systems; Heuristic algorithms; Distributed file system; HDFS; Hadoop; approximation algorithms; local search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems (ICDCS), 2015 IEEE 35th International Conference on
  • Conference_Location
    Columbus, OH
  • ISSN
    1063-6927
  • Type

    conf

  • DOI
    10.1109/ICDCS.2015.52
  • Filename
    7164930