DocumentCode :
3323591
Title :
Handling Uncertain Data in Array Database Systems
Author :
Ge, Tingjian ; Zdonik, Stan
Author_Institution :
Brown Univ., Providence, RI
fYear :
2008
fDate :
7-12 April 2008
Firstpage :
1140
Lastpage :
1149
Abstract :
Scientific and intelligence applications have special data handling needs. In these settings, data does not fit the standard model of short coded records that had dominated the data management area for three decades. Array database systems have a specialized architecture to address this problem. Since the data is typically an approximation of reality, it is important to be able to handle imprecision and uncertainty in an efficient and provably accurate way. We propose a discrete approach for value distributions and adopt a standard metric (i.e., variation distance) in probability theory to measure the quality of a result distribution. We then propose a novel algorithm that has a provable upper bound on the variation distance between its result distribution and the "ideal" one. Complementary to that, we advocate the usage of a "statistical mode" suitable for the results of many queries and applications, which is also much more efficient for execution. We show how the statistical mode also presents interesting predicate evaluation strategies. In addition, extensive experiments are performed on real world datasets to evaluate our algorithms.
Keywords :
data handling; distributed databases; array database system; data handling needs; data management area; probability theory; real world datasets; Convolution; Data handling; Database systems; Deductive databases; Distributed computing; Intelligent sensors; Probability; Temperature sensors; Uncertainty; Upper bound;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
Conference_Location :
Cancun
Print_ISBN :
978-1-4244-1836-7
Electronic_ISBN :
978-1-4244-1837-4
Type :
conf
DOI :
10.1109/ICDE.2008.4497523
Filename :
4497523
Link To Document :
بازگشت