Title :
An Effective GPU-Based Approach to Probabilistic Query Confidence Computation
Author :
Serra, Emmanuele ; Spezzano, Francesca
Author_Institution :
Univ. of Maryland Inst. for Adv. Comput. Studies (UMIACS), College Park, MD, USA
Abstract :
In recent years, probabilistic data management has received a lot of attention due to several applications that deal with uncertain data: RFID systems, sensor networks, data cleaning, scientific and biomedical data management, and approximate schema mappings. Query evaluation is a challenging problem in probabilistic databases, proved to be #P-hard. A general method for query evaluation is based on the lineage of the query and reduces the query evaluation problem to computing the probability of a propositional formula. The main approaches proposed in the literature to approximate probabilistic queries confidence computation are based on Monte Carlo simulation, or formula compilation into decision diagrams (e.g., d-trees). The former executes a polynomial, but with too many, iterations, while the latter is polynomial for easy queries, but may be exponential in the worst case. We designed a new optimized Monte Carlo algorithm that drastically reduces the number of iterations and proposed an efficient parallel version that we implemented on GPU. Thanks to the elevated degree of parallelism provided by the GPU, combined with the linear speedup of our algorithm, we managed to reduce significantly the long running time required by a sequential Monte Carlo algorithm. Experimental results show that our algorithm is so efficient as to be comparable with the formula compilation approach, but with the significant advantage of avoiding exponential behavior.
Keywords :
Monte Carlo methods; computational complexity; database management systems; decision diagrams; graphics processing units; parallel processing; polynomials; probability; query processing; GPU-based approach; Monte Carlo simulation; NP-hard problem; RFID systems; approximate schema mappings; biomedical data management; data cleaning; decision diagrams; formula compilation; optimized Monte Carlo algorithm; parallelism degree; polynomial; probabilistic data management; probabilistic databases; probabilistic query confidence computation; propositional formula probability; query evaluation; scientific data management; sensor networks; sequential Monte Carlo algorithm; Approximation algorithms; Databases; Graphics processing units; Instruction sets; Monte Carlo methods; Probabilistic logic; Random variables; GPU-computing; Probabilistic databases; monte carlo; parallel algorithms; probabilistic reasoning; query processing;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2014.2324571