• DocumentCode
    1454100
  • Title

    Adaptive Cluster Distance Bounding for High-Dimensional Indexing

  • Author

    Ramaswamy, Sharadh ; Rose, Kenneth

  • Author_Institution
    Res. Staff, Mayachitra, Inc., Santa Barbara, CA, USA
  • Volume
    23
  • Issue
    6
  • fYear
    2011
  • fDate
    6/1/2011 12:00:00 AM
  • Firstpage
    815
  • Lastpage
    830
  • Abstract
    We consider approaches for similarity search in correlated, high-dimensional data sets, which are derived within a clustering framework. We note that indexing by “vector approximation” (VA-File), which was proposed as a technique to combat the “Curse of Dimensionality,” employs scalar quantization, and hence necessarily ignores dependencies across dimensions, which represents a source of suboptimality. Clustering, on the other hand, exploits interdimensional correlations and is thus a more compact representation of the data set. However, existing methods to prune irrelevant clusters are based on bounding hyperspheres and/or bounding rectangles, whose lack of tightness compromises their efficiency in exact nearest neighbor search. We propose a new cluster-adaptive distance bound based on separating hyperplane boundaries of Voronoi clusters to complement our cluster based index. This bound enables efficient spatial filtering, with a relatively small preprocessing storage overhead and is applicable to euclidean and Mahalanobis similarity measures. Experiments in exact nearest-neighbor set retrieval, conducted on real data sets, show that our indexing method is scalable with data set size and data dimensionality and outperforms several recently proposed indexes. Relative to the VA-File, over a wide range of quantization resolutions, it is able to reduce random IO accesses, given (roughly) the same amount of sequential IO operations, by factors reaching 100X and more.
  • Keywords
    indexing; pattern clustering; Mahalanobis similarity measurement; adaptive cluster distance bounding; cluster adaptive distance; clustering framework; high dimensional indexing; scalar quantization; vector approximation; Image databases; Image retrieval; Indexing; Information retrieval; Information systems; Multimedia databases; Nearest neighbor searches; Optical signal processing; Quantization; Space technology; Multimedia databases; clustering; image databases.; indexing methods; similarity measures;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2010.59
  • Filename
    5438994