DocumentCode
693407
Title
Energy Analysis of Hadoop Cluster Failure Recovery
Author
Weiyue Xu ; Ying Lu
Author_Institution
Dept. of Comput. Sci. & Eng., Univ. of Nebraska-Lincoln, Lincoln, NE, USA
fYear
2013
fDate
16-18 Dec. 2013
Firstpage
141
Lastpage
146
Abstract
Energy efficiency is now used as an important metric for evaluating a computing system. However, saving energy is a big challenge due to many constraints. For example, in one of the most popular distributed processing frameworks, Hadoop, three replicas of each data block are randomly distributed in order to improve performance and fault tolerance. But such a mechanism limits the largest number of machines that can be turned off to save energy without affecting the data availability. To overcome this limitation, previous research introduces a new mechanism called covering subset which maintains a set of active nodes to ensure the immediate availability of data, even when all other nodes are turned off. This covering subset based mechanism works smoothly if no failure happens. However, a node in the covering subset may fail. In this paper, we study the energy-efficient failure recovery in Hadoop clusters. Rather than only using the replication as adopted by a Hadoop system by default, we investigate both replication and erasure coding as possible redundancy mechanisms. We develop failure recovery algorithms for both systems and analytically compare their energy efficiency.
Keywords
distributed processing; fault tolerant computing; pattern clustering; power aware computing; redundancy; system recovery; Hadoop cluster failure recovery; Hadoop system; computing system; covering subset; covering subset based mechanism; data availability; data block; distributed processing frameworks; energy analysis; energy efficiency; energy saving; energy-efficient failure recovery; erasure coding; failure recovery algorithms; fault tolerance; random distribution; redundancy mechanisms; Data transfer; Decoding; Distributed databases; Encoding; Energy consumption; Fault tolerance; Fault tolerant systems;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2013 International Conference on
Conference_Location
Taipei
Print_ISBN
978-1-4799-2418-9
Type
conf
DOI
10.1109/PDCAT.2013.29
Filename
6904246
Link To Document