Title :
Implementation of the KNN algorithm based on Hadoop
Author :
Shengpeng Lu;Weiqin Tong;Zuanjian Chen
Author_Institution :
School of Computer Engineering and Science, Shanghai University, Shanghai, P.R. China
fDate :
7/1/2015 12:00:00 AM
Abstract :
K-Nearest Neighbors algorithm (KNN) is simple, effective and linear in the field of text classification. The major constraint of the KNN algorithm is to resolve its time complexity. Hadoop provides the distributed processing of large data sets over clusters of computers using simple programming models. In this paper, KNN algorithm has been improved by implementing on Hadoop, taking advantage of distributed processing and the linear feature of the KNN algorithm. The speedups have been compared by using different number of nodes with each different data size. The results of the experiments show that good speedup curve for parallel KNN algorithm uses at least three nodes. This implementation can also improve the scope of the KNN algorithm.
Conference_Titel :
Smart and Sustainable City and Big Data (ICSSC), 2015 International Conference on
DOI :
10.1049/cp.2015.0265