• DocumentCode
    3324827
  • Title

    On High Dimensional Indexing of Uncertain Data

  • Author

    Aggarwal, Charu C. ; Yu, Philip S.

  • Author_Institution
    T.J. Watson Res. Center, IBM, Hawthorne, NY
  • fYear
    2008
  • fDate
    7-12 April 2008
  • Firstpage
    1460
  • Lastpage
    1461
  • Abstract
    In this paper, we will examine the problem of distance function computation and indexing uncertain data in high dimensionality for nearest neighbor and range queries. Because of the inherent noise in uncertain data, traditional distance function measures such as the Lq-metric and their probabilistic variants are not qualitatively effective. This problem is further magnified by the sparsity issue in high dimensionality. In this paper, we examine methods of computing distance functions for high dimensional data which are qualitatively effective and friendly to the use of indexes. In this paper, we show how to construct an effective index structure in order to handle uncertain similarity and range queries in high dimensionality. Typical range queries in high dimensional space use only a subset of the ranges in order to resolve the queries. Furthermore, it is often desirable to run similarity queries with only a subset of the large number of dimensions. Such queries are difficult to resolve with traditional index structures which use the entire set of dimensions. We propose query-processing techniques which use effective search methods on the index in order to compute the final results. We discuss the experimental results on a number of real and synthetic data sets in terms of effectiveness and efficiency. We show that the proposed distance measures are not only more effective than traditional Lq -norms, but can also be computed more efficiently over our proposed index structure.
  • Keywords
    indexing; query processing; Lq metric; distance function computation; high dimensional indexing; index structure; nearest neighbor; query processing techniques; synthetic data sets; uncertain data; Data mining; Drives; Indexing; Nearest neighbor searches; Noise measurement; Probability density function; Probability distribution; Search methods; Statistical analysis; Uncertainty;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4244-1836-7
  • Electronic_ISBN
    978-1-4244-1837-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2008.4497589
  • Filename
    4497589