Title :
Improving MapReduce Performance in Heterogeneous Network Environments and Resource Utilization
Author :
Guo, Zhenhua ; Fox, Geoffrey
Author_Institution :
Sch. of Inf. & Comput., Indiana Univ., Bloomington, IN, USA
Abstract :
MapReduce is a widely-used model for data parallel applications. We found its resource utilization is inefficient when there are not enough tasks to fill all task slots as the resources "reserved" for idle slots are just wasted. We propose resource stealing which enables running tasks to steal the unutilized resources and return them when new tasks are assigned. It exploits the opportunistic use of the otherwise wasted resources to improve overall resource utilization and reduce job execution time. Besides, our practical use of Hadoop shows the current mechanism adopted to trigger speculative execution creates many unnecessary speculative tasks that are killed soon after creation as the original tasks complete earlier. To alleviate the issue, we propose Benefit Aware Speculative Execution which predicts the benefit of running new speculative tasks and greatly eliminates unnecessary runs. Finally, MapReduce is mainly optimized for homogeneous environments and its inefficiency in heterogeneous network environments has been observed in our experiments. We investigate network heterogeneity aware scheduling of both map and reduce tasks. Overall, our goal is to enhance Hadoop to cope with significant network heterogeneity and improve resource utilization.
Keywords :
parallel processing; public domain software; resource allocation; security of data; software performance evaluation; Hadoop; MapReduce performance improvement; benefit aware speculative execution; data parallel applications; heterogeneous network environments; homogeneous environments; job execution time reduction; network heterogeneity; overall resource utilization improvement; resource stealing; Bandwidth; Harmonic analysis; Peer to peer computing; Real time systems; Resource management; Schedules; USA Councils; MapReduce; heterogeneity; scheduling; utilization;
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on
Conference_Location :
Ottawa, ON
Print_ISBN :
978-1-4673-1395-7
DOI :
10.1109/CCGrid.2012.12