• DocumentCode
    3530876
  • Title

    Curse of Dimensionality in Pivot Based Indexes

  • Author

    Volnyansky, Ilya ; Pestov, Vladimir

  • Author_Institution
    Dept. of Math. & Stat., Univ. of Ottawa, Ottawa, ON, Canada
  • fYear
    2009
  • fDate
    29-30 Aug. 2009
  • Firstpage
    39
  • Lastpage
    46
  • Abstract
    We offer a theoretical validation of the curse of dimensionality in the pivot based indexing of datasets for similarity search, by proving, in the framework of statistical learning, that in high dimensions no pivot based indexing scheme can essentially outperform the linear scan. A study of the asymptotic performance of pivot based indexing schemes is performed on a sequence of datasets modeled as samples picked in i.i.d. fashion from a sequence of metric spaces. We allow the size of the dataset to grow in relation to dimension, such that the dimension is superlogarithmic but subpolynomial in the size of the dataset. The number of pivots is sublinear in the size of the dataset. We pick the least restrictive cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the intrinsic dimension of the spaces in the sense of concentration of measure phenomenon is linear in dimension, then the performance of similarity search pivot based indexes is asymptotically linear in the size of the dataset.
  • Keywords
    data structures; database indexing; statistical analysis; datasets sequence model; dimensionality curse validation; pivot based indexing; similarity search; statistical learning framework; Bismuth; Costs; Databases; Extraterrestrial measurements; Extraterrestrial phenomena; Indexing; Mathematics; Probability; Statistical learning; Statistics; Concentration of Measure; Curse of dimensionality; Data structures; Similarity search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Similarity Search and Applications, 2009. SISAP '09. Second International Workshop on
  • Conference_Location
    Prague
  • Print_ISBN
    978-0-7695-3765-8
  • Type

    conf

  • DOI
    10.1109/SISAP.2009.9
  • Filename
    5271952