• DocumentCode
    787167
  • Title

    Clustering for approximate similarity search in high-dimensional spaces

  • Author

    Li, Chen ; Chang, Edward ; Garcia-Molina, Hector ; Wiederhold, Gio

  • Author_Institution
    Dept. of Comput. Sci., Stanford Univ., CA, USA
  • Volume
    14
  • Issue
    4
  • fYear
    2002
  • Firstpage
    792
  • Lastpage
    808
  • Abstract
    We present a clustering and indexing paradigm (called Clindex) for high-dimensional search spaces. The scheme is designed for approximate similarity searches, where one would like to find many of the data points near a target point, but where one can tolerate missing a few near points. For such searches, our scheme can find near points with high recall in very few IOs and perform significantly better than other approaches. Our scheme is based on finding clusters and, then, building a simple but efficient index for them. We analyze the trade-offs involved in clustering and building such an index structure, and present extensive experimental results
  • Keywords
    computational complexity; database indexing; pattern clustering; query processing; tree data structures; very large databases; visual databases; Clindex; approximate similarity search; clustering; experimental results; high recall; high-dimensional search spaces; image database; indexing; large databases; time complexity; tree-like index structures; Buildings; Clustering algorithms; Content based retrieval; Geometry; Image retrieval; Indexing; Information retrieval; Nearest neighbor searches; Object detection; Search engines;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2002.1019214
  • Filename
    1019214