DocumentCode :
3435962
Title :
Improving MapReduce Performance by Streaming Input Data from Multiple Replicas
Author :
Jiadong Wu ; Bo Hong
Author_Institution :
Sch. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
Volume :
1
fYear :
2013
fDate :
2-5 Dec. 2013
Firstpage :
623
Lastpage :
630
Abstract :
The MapReduce programming model, along with its open-source implementation Hadoop has provided a cost effective solution for many data-intensive applications. Hadoop stores data distributively and exploits data locality by assigning tasks to where data is stored. In many cases, however, accessing remote data (rack-local and off-rack) is inevitable. In this paper we are evaluating the possibility of improving the remote data accessing performance by streaming data from multiple available replicas. The proposed design consists of a circular buffer, a slice reader and a enhanced Data Node. Such system is capable of adapting to both the static performance variance caused by network topology as well as dynamic variance caused by congestion. Extensive experiments show that mutil-source streaming can significantly improve the throughput of remote data access and accelerate the related map tasks by 10%-20%. In some imbalanced environment, the proposed system can even achieve as much as 4x speedup.
Keywords :
buffer storage; data handling; parallel programming; public domain software; replicated databases; software performance evaluation; DataNode; Hadoop open-source implementation; MapReduce performance improvement; MapReduce programming model; circular buffer; cost effective solution; data locality exploitation; data-intensive applications; distributed data storage; dynamic performance variance; input data streaming; multiple replicas; network topology; remote data access performance improvement; static performance variance; Bandwidth; Benchmark testing; Media; Network topology; Peer-to-peer computing; Servers; Throughput; Hadoop; MapReduce; Mutil-source; Streaming;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Computing Technology and Science (CloudCom), 2013 IEEE 5th International Conference on
Conference_Location :
Bristol
Type :
conf
DOI :
10.1109/CloudCom.2013.88
Filename :
6753854
Link To Document :
بازگشت