Title :
Comparison of a sequential and a MapReduce approach to joining large datasets
Author :
Lalic, Marko ; Memic, Emina ; Kesan, Faruk ; Gondzic, Edita ; Smajic, Nermin ; Nosovic, Novica
Author_Institution :
Dept. for Comput. & Inf., Univ. of Sarajevo, Sarajevo, Bosnia-Herzegovina
Abstract :
MapReduce as a programming model is considered one of the biggest improvements in massive data processing which utilizes parallelization. The increasing amount of data being processed and stored has caused a need to investigate more efficient solutions to common problems, one of which is performing a join operation on two interconnected datasets. In this paper, a classic sequential solution to this problem is compared with a MapReduce approach, with the intent of discovering the relative advantages of the two. The sequential application runtime for datasets of negligible sizes in today´s terms is proven prohibitively slow. Furthermore, a MapReduce cluster of five Amazon EC2 nodes is shown to process, in the same time period, ten times larger data than the sequential application.
Keywords :
data analysis; pattern clustering; programming; very large databases; Amazon EC2 nodes; MapReduce cluster; interconnected datasets; large datasets; massive data processing; programming model; sequential application runtime; Clustering algorithms; Computational modeling; Data processing; Distributed databases; Educational institutions; Facebook; Programming; Hadoop; MapReduce; cluster; distributed join; join;
Conference_Titel :
Information & Communication Technology Electronics & Microelectronics (MIPRO), 2013 36th International Convention on
Conference_Location :
Opatija
Print_ISBN :
978-953-233-076-2