• DocumentCode
    1055890
  • Title

    Bounded Approximation: A New Criterion for Dimensionality Reduction Approximation in Similarity Search

  • Author

    Vu, Khanh ; Hua, Kien A. ; Cheng, Hao ; Lang, Sheau-Dong

  • Author_Institution
    Sch. of Electr. Eng. & Comput. Sci., Univ. of Central Florida, Orlando, FL
  • Volume
    20
  • Issue
    6
  • fYear
    2008
  • fDate
    6/1/2008 12:00:00 AM
  • Firstpage
    768
  • Lastpage
    783
  • Abstract
    We examine the problem of efficient distance-based similarity search over high-dimensional data. We show that a promising approach to this problem is to reduce dimensions and allow fast approximation. Conventional reduction approaches, however, entail a significant shortcoming: The approximation volume extends across the dataspace, which causes overestimation of retrieval sets and impairs performance. This paper focuses on a new criterion for dimensionality reduction methods: bounded approximation. We show that this requirement can be accomplished by a novel nonlinear transformation scheme that extracts two important parameters from the data. We devise two approximation formulations, namely, rectangular and spherical range search, each corresponding to a closed volume around the original search sphere. We discuss in detail how we can derive tight bounds for the parameters and prove further results, as well as highlight insights into the problems and our proposed solutions. To demonstrate the benefits of the new criterion, we study the effects of (un)boundedness on approximation performance, including selectivity, error toleration, and efficiency. Extensive experiments confirm the superiority of this technique over recent state-of-the-art schemes.
  • Keywords
    approximation theory; query processing; bounded approximation; dimensionality reduction approximation; nonlinear transformation scheme; similarity search; Information Search and Retrieval; Information Storage and Retrieval; Search process;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2008.30
  • Filename
    4445668