Title :
Dynamically Controlling Node-Level Parallelism in Hadoop
Author :
Kc, Kamal ; Freeh, Vincent W.
Author_Institution :
North Carolina State Univ., Raleigh, NC, USA
Abstract :
Hadoop is a widely used large scale data processing framework. Applications run in Hadoop as containers, the concurrency of which affects completion time of an application as well as system resource usage. When there are too many concurrent containers, resource bottlenecks occur and when there too few, system resources are underutilized. The default and best practice settings underutilize resources which results in longer application completion times. In this work, we develop an approach to dynamically change the parallelism for concurrent containers to suit an application. Our approach ensures efficient utilization of resources and avoids bottlenecks for all types of MapReduce applications. Our approach improves performance of MapReduce applications by as much as 28% and 60% respectively when compared to the best practice and default settings.
Keywords :
concurrency control; data handling; parallel processing; resource allocation; Hadoop; MapReduce application; concurrent containers; dynamic node-level parallelism control; large scale data processing framework; resource bottlenecks; system resource usage; Adaptive control; Containers; Measurement; PD control; Resource management; Tuning; Yarn; Hadoop; MapReduce; performance tuning;
Conference_Titel :
Cloud Computing (CLOUD), 2015 IEEE 8th International Conference on
Conference_Location :
New York City, NY
Print_ISBN :
978-1-4673-7286-2
DOI :
10.1109/CLOUD.2015.49