Title :
Evaluation and Analysis of GreenHDFS: A Self-Adaptive, Energy-Conserving Variant of the Hadoop Distributed File System
Author :
Kaushik, Rini T. ; Bhandarkar, Milind ; Nahrstedt, Klara
Author_Institution :
Univ. of Illinois, Urbana-Champaign, Urbana, IL, USA
fDate :
Nov. 30 2010-Dec. 3 2010
Abstract :
We present a detailed evaluation and sensitivity analysis of an energy-conserving, highly scalable variant of the Hadoop Distributed File System (HDFS) called Green-HDFS. Green HDFS logically divides the servers in a Hadoop cluster into Hot and Cold Zones and relies on insightful data-classification driven energy-conserving data placement to realize guaranteed, substantially long periods(several days) of idleness in a significant subset of servers in the Cold Zone. Detailed lifespan analysis of the files in a large-scale production Hadoop cluster at Yahoo! points at the viability of Green HDFS. Simulation results with real-worldYahoo! HDFS traces show that Green HDFS can achieve 24% energy cost reduction by doing power management in only one top-level tenant directory in the cluster and meets all the scale-down mandates in spite of the unique scale-down challenges present in a Hadoop cluster. If Green HDFS technique is applied to all the Hadoop clusters at Yahoo! (amounting to 38000 servers), $2.1millioncan be saved in energy costs per annum. Sensitivity analysis shows that energy-conservation is minimally sensitive to the thresholds in Green HDFS. Lifespan analysis points out that one-size-fits-all energy-management policies won´tsuffice in a multi-tenant Hadoop Cluster.
Keywords :
cloud computing; distributed databases; energy management systems; pattern clustering; power aware computing; Hadoop cluster; Hadoop distributed file system; Yahoo!; data-classification; energy cost reduction; energy-conserving variant; greenHDFS; lifespan analysis; power management; servers; top-level tenant directory; Degradation; Google; Green products; Production; Random access memory; Servers; Temperature measurement; Cloud-computing; Green; data-intensive computing; energy-conservation; hadoop; hadoop distributed file system; performance; scale-down;
Conference_Titel :
Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on
Conference_Location :
Indianapolis, IN
Print_ISBN :
978-1-4244-9405-7
Electronic_ISBN :
978-0-7695-4302-4
DOI :
10.1109/CloudCom.2010.109