Title :
A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality
Author_Institution :
Dept. of Math. & Comput. Sci., Northwest Nazarene Univ., Nampa, ID, USA
fDate :
July 31 2011-Aug. 5 2011
Abstract :
The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 106 records and 104 dimensions, kMkNN shows a 2- to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.
Keywords :
learning (artificial intelligence); pattern clustering; trees (mathematics); ball-tree; complex tree structures; exact k-NN algorithm; exact k-nearest neighbors algorithm; feature space; high dimensional search; k-means clustering method; k-means for k-nearest neighbors; kd-trees; machine learning method; metric trees; tree-based k-NN algorithms; triangle inequality; Acceleration; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Machine learning algorithms; Search problems; Training;
Conference_Titel :
Neural Networks (IJCNN), The 2011 International Joint Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
978-1-4244-9635-8
DOI :
10.1109/IJCNN.2011.6033373