Title :
Grouping Blocks for MapReduce Co-Locality
Author :
Xiao Yu ; Bo Hong
Author_Institution :
Georgia Inst. of Technol., Atlanta, GA, USA
Abstract :
Avoiding off-switch communication is critical in enhancing the performance of MapReduce/Hadoop cluster. Current efforts in Hadoop only focus on minimizing off-switch for map tasks, and yet reduce tasks shuffle data across the whole cluster because file blocks (hence map tasks) are scattered. In this paper, we argue that grouping blocks in a few racks can greatly decrease the amount of off-switch data exchange and therefore shorten the execution time of jobs in the cluster. We proposed mechanisms to place data in a grouped fashion and to schedule tasks accordingly. We explored the trade-off between the improvement on off-switch communication and loss of parallelism, we discussed methods to mitigate the loss of parallelism issue. Extensive experiments show that our method can significantly avoid off-switch communication, and in result decrease job execution time by up to 56%.
Keywords :
electronic data interchange; parallel processing; pattern clustering; scheduling; Hadoop cluster; MapReduce cluster; MapReduce co-locality; job execution time; map task; off-switch communication; off-switch data exchange; task scheduling; task shuffle data; Bandwidth; History; Parallel processing; Runtime; Schedules; Scheduling; Switches; Group Data Blocks; Map/Reduce Co-locality; MapReduce/Hadoop;
Conference_Titel :
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
Conference_Location :
Hyderabad
DOI :
10.1109/IPDPS.2015.16