DocumentCode
1847900
Title
Performance of Left Outer Join on Hadoop with Right Side within Single Node Memory Size
Author
Byambajav, Byambajargal ; Wlodarczyk, Tomasz Wiktor ; Rong, Chunming ; LePendu, Paea ; Shah, Nigam
Author_Institution
Dept. of Comput. Sci. & Electr. Eng., Univ. of Stavanger Stavanger, Stavanger, Norway
fYear
2012
fDate
26-29 March 2012
Firstpage
1075
Lastpage
1080
Abstract
In this paper we compare performance results of different implementations of join operation in Hadoop in a scenario where right side (of the join) is within single node memory size. We present results for several implementations both in pure Map Reduce and in Pig, both basing on HDFS. We also compare distributed performance of those implementations with a single node implementation in MySQL. Results show that Pig implementations do not match pure Map Reduce versions by a bigger margin than expected. Moreover, we notice that Map tasks seem to be the element that influences performance the most, especially for the potentially more efficient methods. Currently, we achieved the best performance using a singleton pattern join. However, there are reasons to believe that this performance can be still improved with better control of the amount of Map tasks.
Keywords
SQL; parallel programming; storage management; HDFS; Hadoop; Map Reduce; MySQL; Pig; single node memory size; Bioinformatics; Computer architecture; Context; Indexes; Java; Ontologies; Semantics; Hadoop; Join; MapReduce; Semantic Expansion;
fLanguage
English
Publisher
ieee
Conference_Titel
Advanced Information Networking and Applications Workshops (WAINA), 2012 26th International Conference on
Conference_Location
Fukuoka
Print_ISBN
978-1-4673-0867-0
Type
conf
DOI
10.1109/WAINA.2012.20
Filename
6185392
Link To Document