DocumentCode
3323591
Title
Handling Uncertain Data in Array Database Systems
Author
Ge, Tingjian ; Zdonik, Stan
Author_Institution
Brown Univ., Providence, RI
fYear
2008
fDate
7-12 April 2008
Firstpage
1140
Lastpage
1149
Abstract
Scientific and intelligence applications have special data handling needs. In these settings, data does not fit the standard model of short coded records that had dominated the data management area for three decades. Array database systems have a specialized architecture to address this problem. Since the data is typically an approximation of reality, it is important to be able to handle imprecision and uncertainty in an efficient and provably accurate way. We propose a discrete approach for value distributions and adopt a standard metric (i.e., variation distance) in probability theory to measure the quality of a result distribution. We then propose a novel algorithm that has a provable upper bound on the variation distance between its result distribution and the "ideal" one. Complementary to that, we advocate the usage of a "statistical mode" suitable for the results of many queries and applications, which is also much more efficient for execution. We show how the statistical mode also presents interesting predicate evaluation strategies. In addition, extensive experiments are performed on real world datasets to evaluate our algorithms.
Keywords
data handling; distributed databases; array database system; data handling needs; data management area; probability theory; real world datasets; Convolution; Data handling; Database systems; Deductive databases; Distributed computing; Intelligent sensors; Probability; Temperature sensors; Uncertainty; Upper bound;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
Conference_Location
Cancun
Print_ISBN
978-1-4244-1836-7
Electronic_ISBN
978-1-4244-1837-4
Type
conf
DOI
10.1109/ICDE.2008.4497523
Filename
4497523
Link To Document