Author_Institution :
California Univ., Santa Barbara, CA, USA
Abstract :
The ability to store and query uncertain information is of great benefit to databases that infer values from a set of observations, including databases of moving objects, sensor readings, historical business transactions, and biomedical images. These observations are often inexact to begin with, and even if they are exact, a set of observations of an attribute of an object is better represented by a probability distribution than by a single number, such as a mean. In this paper, we present adaptive, piecewise-linear approximations (APLAs), which represent arbitrary probability distributions compactly with guaranteed quality. We also present the APLA-tree, an index structure for APLAs. Because APLA is more precise than existing approximation techniques, the APLA-tree can answer probabilistic range queries twice as fast. APLA generalizes to multiple dimensions, and the APLA-tree can index multivariate distributions using either one-dimensional or multidimensional APLAs. Finally, we propose a new definition of k-NN queries on uncertain data. The new definition allows APLA and the APLA-tree to answer k-NN queries quickly, even on arbitrary probability distributions. No efficient k-NN search was previously possible on such distributions.
Keywords :
approximation theory; database indexing; query processing; statistical distributions; APLA; adaptive piecewise-linear approximations; arbitrary probability distribution indexing; biomedical images; historical business transactions; index multivariate distributions; k-NN queries; moving objects; probabilistic range queries; query uncertain information; sensor readings; uncertain data; Biomedical imaging; Biosensors; Image databases; Image sensors; Indexing; Multidimensional systems; Piecewise linear techniques; Probability distribution; Sensor phenomena and characterization; Transaction databases;