• DocumentCode
    693407
  • Title

    Energy Analysis of Hadoop Cluster Failure Recovery

  • Author

    Weiyue Xu ; Ying Lu

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Univ. of Nebraska-Lincoln, Lincoln, NE, USA
  • fYear
    2013
  • fDate
    16-18 Dec. 2013
  • Firstpage
    141
  • Lastpage
    146
  • Abstract
    Energy efficiency is now used as an important metric for evaluating a computing system. However, saving energy is a big challenge due to many constraints. For example, in one of the most popular distributed processing frameworks, Hadoop, three replicas of each data block are randomly distributed in order to improve performance and fault tolerance. But such a mechanism limits the largest number of machines that can be turned off to save energy without affecting the data availability. To overcome this limitation, previous research introduces a new mechanism called covering subset which maintains a set of active nodes to ensure the immediate availability of data, even when all other nodes are turned off. This covering subset based mechanism works smoothly if no failure happens. However, a node in the covering subset may fail. In this paper, we study the energy-efficient failure recovery in Hadoop clusters. Rather than only using the replication as adopted by a Hadoop system by default, we investigate both replication and erasure coding as possible redundancy mechanisms. We develop failure recovery algorithms for both systems and analytically compare their energy efficiency.
  • Keywords
    distributed processing; fault tolerant computing; pattern clustering; power aware computing; redundancy; system recovery; Hadoop cluster failure recovery; Hadoop system; computing system; covering subset; covering subset based mechanism; data availability; data block; distributed processing frameworks; energy analysis; energy efficiency; energy saving; energy-efficient failure recovery; erasure coding; failure recovery algorithms; fault tolerance; random distribution; redundancy mechanisms; Data transfer; Decoding; Distributed databases; Encoding; Energy consumption; Fault tolerance; Fault tolerant systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2013 International Conference on
  • Conference_Location
    Taipei
  • Print_ISBN
    978-1-4799-2418-9
  • Type

    conf

  • DOI
    10.1109/PDCAT.2013.29
  • Filename
    6904246