• DocumentCode
    1847900
  • Title

    Performance of Left Outer Join on Hadoop with Right Side within Single Node Memory Size

  • Author

    Byambajav, Byambajargal ; Wlodarczyk, Tomasz Wiktor ; Rong, Chunming ; LePendu, Paea ; Shah, Nigam

  • Author_Institution
    Dept. of Comput. Sci. & Electr. Eng., Univ. of Stavanger Stavanger, Stavanger, Norway
  • fYear
    2012
  • fDate
    26-29 March 2012
  • Firstpage
    1075
  • Lastpage
    1080
  • Abstract
    In this paper we compare performance results of different implementations of join operation in Hadoop in a scenario where right side (of the join) is within single node memory size. We present results for several implementations both in pure Map Reduce and in Pig, both basing on HDFS. We also compare distributed performance of those implementations with a single node implementation in MySQL. Results show that Pig implementations do not match pure Map Reduce versions by a bigger margin than expected. Moreover, we notice that Map tasks seem to be the element that influences performance the most, especially for the potentially more efficient methods. Currently, we achieved the best performance using a singleton pattern join. However, there are reasons to believe that this performance can be still improved with better control of the amount of Map tasks.
  • Keywords
    SQL; parallel programming; storage management; HDFS; Hadoop; Map Reduce; MySQL; Pig; single node memory size; Bioinformatics; Computer architecture; Context; Indexes; Java; Ontologies; Semantics; Hadoop; Join; MapReduce; Semantic Expansion;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Information Networking and Applications Workshops (WAINA), 2012 26th International Conference on
  • Conference_Location
    Fukuoka
  • Print_ISBN
    978-1-4673-0867-0
  • Type

    conf

  • DOI
    10.1109/WAINA.2012.20
  • Filename
    6185392