Title :
Join queries on uncertain data: Semantics and efficient processing
Author_Institution :
Dept. of Comput. Sci., Univ. of Kentucky, Lexington, KY, USA
Abstract :
Uncertain data is quite common nowadays in a variety of modern database applications. At the same time, the join operation is one of the most important but expensive operations in SQL. However, join queries on uncertain data have not been adequately addressed thus far. In this paper, we study the SQL join operation on uncertain attributes. We observe and formalize two kinds of join operations on such data, namely v-join and d-join. They are each useful for different applications. Using probability theory, we then devise efficient query processing algorithms for these join operations. Specifically, we use probability bounds that are based on the moments of random variables to either early accept or early reject a candidate v-join result tuple. We also devise an indexing mechanism and an algorithm called Two-End Zigzag Join to further save I/O costs. For d-join, we first observe that it can be reduced to a special form of similarity join in a multidimensional space. We then design an efficient algorithm called condensed d-join and an optimal condensation scheme based on dynamic programming. Finally, we perform a comprehensive empirical study using both real datasets and synthetic datasets.
Keywords :
SQL; data handling; database management systems; dynamic programming; probability; query processing; SQL join operation; d-join data; dynamic programming; indexing mechanism; join queries; optimal condensation scheme; probability theory; query processing algorithms; two-end zigzag; uncertain data handling; v-join data; Algorithm design and analysis; Indexing; Probabilistic logic; Random variables; Reactive power; Uncertainty;
Conference_Titel :
Data Engineering (ICDE), 2011 IEEE 27th International Conference on
Conference_Location :
Hannover
Print_ISBN :
978-1-4244-8959-6
Electronic_ISBN :
1063-6382
DOI :
10.1109/ICDE.2011.5767888