• DocumentCode
    3495394
  • Title

    A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality

  • Author

    Wang, Xueyi

  • Author_Institution
    Dept. of Math. & Comput. Sci., Northwest Nazarene Univ., Nampa, ID, USA
  • fYear
    2011
  • fDate
    July 31 2011-Aug. 5 2011
  • Firstpage
    1293
  • Lastpage
    1299
  • Abstract
    The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 106 records and 104 dimensions, kMkNN shows a 2- to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.
  • Keywords
    learning (artificial intelligence); pattern clustering; trees (mathematics); ball-tree; complex tree structures; exact k-NN algorithm; exact k-nearest neighbors algorithm; feature space; high dimensional search; k-means clustering method; k-means for k-nearest neighbors; kd-trees; machine learning method; metric trees; tree-based k-NN algorithms; triangle inequality; Acceleration; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Machine learning algorithms; Search problems; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2011 International Joint Conference on
  • Conference_Location
    San Jose, CA
  • ISSN
    2161-4393
  • Print_ISBN
    978-1-4244-9635-8
  • Type

    conf

  • DOI
    10.1109/IJCNN.2011.6033373
  • Filename
    6033373