• DocumentCode
    3334499
  • Title

    High-dimensional similarity retrieval using dimensional choice

  • Author

    Tahmoush, Dave ; Samet, Hanan

  • Author_Institution
    Univ. of Maryland, College Park, MD
  • fYear
    2008
  • fDate
    7-12 April 2008
  • Firstpage
    330
  • Lastpage
    337
  • Abstract
    There are several pieces of information that can be utilized in order to improve the efficiency of similarity searches on high-dimensional data. The most commonly used information is the distribution of the data itself but the use of dimensional choice based on the information in the query as well as the parameters of the distribution can provide an effective improvement in the query processing speed and storage. The use of this method can produce dimension reduction by as much as a factor of n, the number of data points in the database, over sequential search. We demonstrate that the curse of dimensionality is not based on the dimension of the data itself but primarily upon the effective dimension of the distance function. We also introduce a new distance function that utilizes fewer dimensions of the higher dimensional space to produce a maximal lower bound distance in order to approximate the full distance function. This work has demonstrated significant dimension reduction, up to 70% reduction with an improvement in accuracy or over 99% with only a 6% loss in accuracy on a prostate cancer data set.
  • Keywords
    query processing; UL-Distance; dimensional choice; distance function; high-dimensional similarity retrieval; prostate cancer data set; query processing; similarity search; Bioinformatics; Databases; Density functional theory; Educational institutions; Histograms; Information retrieval; Nearest neighbor searches; Probability density function; Prostate cancer; Query processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering Workshop, 2008. ICDEW 2008. IEEE 24th International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4244-2161-9
  • Electronic_ISBN
    978-1-4244-2162-6
  • Type

    conf

  • DOI
    10.1109/ICDEW.2008.4498342
  • Filename
    4498342