مرکز منطقه ای اطلاع رساني علوم و فناوري - A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality

DocumentCode :

3495394

Title :

A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality

Author :

Wang, Xueyi

Author_Institution :

Dept. of Math. & Comput. Sci., Northwest Nazarene Univ., Nampa, ID, USA

fYear :

2011

fDate :

July 31 2011-Aug. 5 2011

Firstpage :

1293

Lastpage :

1299

Abstract :

The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10⁶ records and 10⁴ dimensions, kMkNN shows a 2- to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.

Keywords :

learning (artificial intelligence); pattern clustering; trees (mathematics); ball-tree; complex tree structures; exact k-NN algorithm; exact k-nearest neighbors algorithm; feature space; high dimensional search; k-means clustering method; k-means for k-nearest neighbors; kd-trees; machine learning method; metric trees; tree-based k-NN algorithms; triangle inequality; Acceleration; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Machine learning algorithms; Search problems; Training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Neural Networks (IJCNN), The 2011 International Joint Conference on

Conference_Location :

San Jose, CA

ISSN :

2161-4393

Print_ISBN :

978-1-4244-9635-8

Type :

conf

DOI :

10.1109/IJCNN.2011.6033373

Filename :

6033373

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3495394