DocumentCode :
1799820
Title :
SARAH - Statistical Analysis for Resource Allocation in Hadoop
Author :
Martin, Benoit
Author_Institution :
Cloudera, Inc., Palo Alto, CA, USA
fYear :
2014
fDate :
24-26 Sept. 2014
Firstpage :
777
Lastpage :
782
Abstract :
Improving the performance of big data applications requires understanding the size and distribution of the input and intermediate data sets. Obtaining this understanding and then translating it into resource settings is challenging. SARAH provides a set of tools that analyze input and intermediate data sets and recommend configuration settings and performance optimizations. Statistics generated by SARAH are persistently stored, incrementally updated and operate across the several processing frameworks available in Apache Hadoop. In this paper we present the SARAH tool set, describe several Hadoop use cases for utilizing statistics and illustrate the effectiveness of utilizing statistics for balancing reduce workload on Map-Reduce jobs on web server log file data.
Keywords :
Big Data; file servers; resource allocation; statistical analysis; Apache Hadoop; Big Data applications; SARAH; Web server log file data; configuration settings; performance optimizations; reduce workload balancing; resource settings; statistical analysis for resource allocation in Hadoop; Conferences; Privacy; Security; Hadoop; Map-Reduce; big data; performance tuning; statistical analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Trust, Security and Privacy in Computing and Communications (TrustCom), 2014 IEEE 13th International Conference on
Conference_Location :
Beijing
Type :
conf
DOI :
10.1109/TrustCom.2014.102
Filename :
7011326
Link To Document :
بازگشت