• DocumentCode
    478689
  • Title

    Novel approach for nearest neighbor search in high dimensional space

  • Author

    Zhang, Ming ; Alhajj, Reda

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Calgary, Calgary, AB
  • Volume
    2
  • fYear
    2008
  • fDate
    6-8 Sept. 2008
  • Firstpage
    42700
  • Lastpage
    11628
  • Abstract
    Index structures for nearest neighbor search in high-dimensional metric space are mostly built by partitioning the data set based on distances to certain reference point(s). Using the constructed index, the search is limited to a smaller number of the partitions in a way to avoid exhaustive search. However, the approaches already described in the literature either ignore the property of the data distribution or produce non-disjoint partitions; this greatly aspects the search efficiency. In this paper, we propose a new index structure, which overcomes the above disadvantages. The proposed tree structure is constructed by recursively dividing the data set into a nested set of approximate equivalence classes. We also propose a new reference point selection method using principal component analysis (PCA). The conducted analysis and the reported test results demonstrate that the proposed index structure, empowered by the PCA-based reference selection strategy, gives an optimal partition of the data set and greatly improves the search efficiency compared to the VP-tree, which is one of the approaches well documented in the literature.
  • Keywords
    database indexing; equivalence classes; principal component analysis; tree data structures; tree searching; PCA; data set partitioning; equivalence class; high dimensional space; index structure; nearest neighbor search; principal component analysis; reference point selection method; tree structure; Computer science; Content based retrieval; Extraterrestrial measurements; Intelligent structures; Intelligent systems; Nearest neighbor searches; Object recognition; Principal component analysis; Testing; Tree data structures; content-based retrieval; knn search; partitioning; principal component analysis; similarity search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems, 2008. IS '08. 4th International IEEE Conference
  • Conference_Location
    Varna
  • Print_ISBN
    978-1-4244-1739-1
  • Electronic_ISBN
    978-1-4244-1740-7
  • Type

    conf

  • DOI
    10.1109/IS.2008.4670504
  • Filename
    4670504