Title :
The k-Nearest Neighbor Algorithm Using MapReduce Paradigm
Author :
Prajesh P. Anchalia;Kaushik Roy
Author_Institution :
Dept. of Comput. Sci. &
Abstract :
Data in any form is a valuable resource but more often than not data collected in the real world is completely random and unstructured. Hence, to utilize the true potential of data as a resource we must transform it in such a manner so as to retrieve meaningful information from it. Data mining fulfills this need. Today there is not only a need for efficient data mining techniques to process large volume of data but also a need for a means to meet the computational requirements to process such huge volume of data. In this paper we implement an effective data mining technique known as the k-Nearest Neighbor method on a distributed computing environment running Apache Hadoop that uses the MapReduce paradigm to process high volume data.
Keywords :
"Data mining","Testing","Classification algorithms","Training data","Training","Distributed computing","Algorithm design and analysis"
Conference_Titel :
Intelligent Systems, Modelling and Simulation (ISMS), 2014 5th International Conference on
DOI :
10.1109/ISMS.2014.94