مرکز منطقه ای اطلاع رساني علوم و فناوري - Improving the Shuffle of Hadoop MapReduce

DocumentCode :

3435017

Title :

Improving the Shuffle of Hadoop MapReduce

Author :

Jingui Li ; Xuelian Lin ; Xiaolong Cui ; Yue Ye

Author_Institution :

Sch. of Comput. Sci. & Eng., Beihang Univ., Beijing, China

Volume :

fYear :

2013

fDate :

2-5 Dec. 2013

Firstpage :

266

Lastpage :

273

Abstract :

As an efficient parallel computing system based on MapReduce model, Hadoop is widely used for large-scale data analysis such as data mining, machine learning and scientific simulation. However, there are still some performance problems in MapReduce, especially the situation in the shuffle phase. In order to solve these problems, in this paper, a lightweight individual shuffle service component with more efficient I/O policy was proposed rather than the existing shuffle phase in MapReduce. We also describe how to implement the shuffle service in three steps: extract shuffle from reduce task as a shuffle task, reconstruct the shuffle task as a service and improve I/O scheduling policy on Map sides. Furthermore both simulated experiments and MapReduce job comparative studies are conducted to evaluate the performance of our improvements. The result reveals that our approach can decrease the whole job´s execution time and make full use of cluster resources.

Keywords :

data analysis; data mining; input-output programs; learning (artificial intelligence); parallel programming; public domain software; software performance evaluation; Hadoop MapReduce shuffle improvement; I-O scheduling policy improvement; Map sides; data mining; large-scale data analysis; machine learning; parallel computing system; performance evaluation; reduce task; scientific simulation; shuffle extraction; shuffle service component; shuffle task-as-a-service; Bandwidth; Computational modeling; Data models; Facebook; Google; Memory management; Protocols; hadoop; mapreduce; shuffle;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Cloud Computing Technology and Science (CloudCom), 2013 IEEE 5th International Conference on

Conference_Location :

Bristol

Type :

conf

DOI :

10.1109/CloudCom.2013.42

Filename :

6753807

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3435017