Title :
Evaluation of MapReduce in a Large Cluster
Author :
Kc, Kamal ; Chin-Jung Hsu ; Freeh, Vincent W.
Author_Institution :
North Carolina State Univ., Raleigh, NC, USA
Abstract :
MapReduce is a widely used framework that runs large scale data processing applications. However, there are very few systematic studies of MapReduce on large clusters and thus there is a lack of reference for expected behavior or issues while running applications in a large cluster. This paper describes our findings of running applications on Pivotal´s Analytics Workbench, which consists of a 540-node Hadoop cluster. Our experience sheds light on how applications behave in a large-scale cluster. This paper discusses our experiences in three areas. The first describes scaling behavior of applications as the dataset size increases. The second discusses the appropriate settings for parallelism and overlap of map and reduce tasks. The third area discusses general observations. These areas have not been reported or studied previously. Our findings show that IO-intensive applications do not scale as data size increases and MapReduce applications require different amounts of parallelism and overlap to minimize completion time. Additionally, our observations also highlight the need for appropriate memory allocation for a MapReduce component and the importance of decreasing log file size.
Keywords :
data handling; parallel processing; pattern clustering; 540-node Hadoop cluster; MapReduce evaluation; Pivotal analytics workbench; dataset size; large scale data processing; large-scale cluster; memory allocation; Benchmark testing; Containers; Parallel processing; Resource management; Scalability; Throughput; Yarn; Hadoop; MapReduce; large cluster;
Conference_Titel :
Cloud Computing (CLOUD), 2015 IEEE 8th International Conference on
Conference_Location :
New York City, NY
Print_ISBN :
978-1-4673-7286-2
DOI :
10.1109/CLOUD.2015.68