Title :
MapReduce join strategies for key-value storage
Author :
Van Hieu, Duong ; Smanchat, Sucha ; Meesad, Phayung
Author_Institution :
Fac. of Inf. Technol., King Mongkut´s Univ. of Technol. North Bangkok, Bangkok, Thailand
Abstract :
This paper analyses MapReduce join strategies used for big data analysis and mining known as map-side and reduce-side joins. The most used joins will be analysed in this paper, which are theta-join algorithms including all pair partition join, repartition join, broadcasting join, semi join, per-split semi join. This paper can be considered as a guideline for MapReduce application developers for the selection of join strategies. The analysis of several join strategies for big data analysis and mining is accompanied by comprehensive examples.
Keywords :
Big Data; data analysis; data mining; parallel processing; MapReduce join strategies; big data analysis; big data mining; broadcasting join; key-value storage; map-side joins; per-split semijoin; reduce-side joins; repartition join; theta-join algorithms; MapReduce; NoSQL; join strategy;
Conference_Titel :
Computer Science and Software Engineering (JCSSE), 2014 11th International Joint Conference on
Conference_Location :
Chon Buri
Print_ISBN :
978-1-4799-5821-4
DOI :
10.1109/JCSSE.2014.6841861