• DocumentCode
    1784848
  • Title

    Three-way joins on MapReduce: An experimental study

  • Author

    Kimmett, Ben ; Thomo, Alex ; Venkatesh, Svetha

  • Author_Institution
    Univ. of Victoria, Victoria, BC, Canada
  • fYear
    2014
  • fDate
    7-9 July 2014
  • Firstpage
    227
  • Lastpage
    232
  • Abstract
    We study three-way joins on MapReduce. Joins are very useful in a multitude of applications from data integration and traversing social networks, to mining graphs and automata-based constructions. However, joins are expensive, even for moderate data sets; we need efficient algorithms to perform distributed computation of joins using clusters of many machines. MapReduce has become an increasingly popular distributed computing system and programming paradigm. We consider a state-of-the-art MapReduce multi-way join algorithm by Afrati and Ullman and show when it is appropriate for use on very large data sets. By providing a detailed experimental study, we demonstrate that this algorithm scales much better than what is suggested by the original paper. However, if the join result needs to be summarized or aggregated, as opposed to being only enumerated, then the aggregation step can be integrated into a cascade of two-way joins, making it more efficient than the other algorithm, and thus becomes the preferred solution.
  • Keywords
    distributed algorithms; distributed programming; MapReduce multiway join algorithm; automata-based constructions; data integration; distributed computing system; distributed programming paradigm; graph mining; social networks; three-way join algorithm; very large data sets; Automata; Google; Internet;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information, Intelligence, Systems and Applications, IISA 2014, The 5th International Conference on
  • Conference_Location
    Chania
  • Type

    conf

  • DOI
    10.1109/IISA.2014.6878811
  • Filename
    6878811