• DocumentCode
    9887
  • Title

    An Effective GPU-Based Approach to Probabilistic Query Confidence Computation

  • Author

    Serra, Emmanuele ; Spezzano, Francesca

  • Author_Institution
    Univ. of Maryland Inst. for Adv. Comput. Studies (UMIACS), College Park, MD, USA
  • Volume
    27
  • Issue
    1
  • fYear
    2015
  • fDate
    Jan. 1 2015
  • Firstpage
    17
  • Lastpage
    31
  • Abstract
    In recent years, probabilistic data management has received a lot of attention due to several applications that deal with uncertain data: RFID systems, sensor networks, data cleaning, scientific and biomedical data management, and approximate schema mappings. Query evaluation is a challenging problem in probabilistic databases, proved to be #P-hard. A general method for query evaluation is based on the lineage of the query and reduces the query evaluation problem to computing the probability of a propositional formula. The main approaches proposed in the literature to approximate probabilistic queries confidence computation are based on Monte Carlo simulation, or formula compilation into decision diagrams (e.g., d-trees). The former executes a polynomial, but with too many, iterations, while the latter is polynomial for easy queries, but may be exponential in the worst case. We designed a new optimized Monte Carlo algorithm that drastically reduces the number of iterations and proposed an efficient parallel version that we implemented on GPU. Thanks to the elevated degree of parallelism provided by the GPU, combined with the linear speedup of our algorithm, we managed to reduce significantly the long running time required by a sequential Monte Carlo algorithm. Experimental results show that our algorithm is so efficient as to be comparable with the formula compilation approach, but with the significant advantage of avoiding exponential behavior.
  • Keywords
    Monte Carlo methods; computational complexity; database management systems; decision diagrams; graphics processing units; parallel processing; polynomials; probability; query processing; GPU-based approach; Monte Carlo simulation; NP-hard problem; RFID systems; approximate schema mappings; biomedical data management; data cleaning; decision diagrams; formula compilation; optimized Monte Carlo algorithm; parallelism degree; polynomial; probabilistic data management; probabilistic databases; probabilistic query confidence computation; propositional formula probability; query evaluation; scientific data management; sensor networks; sequential Monte Carlo algorithm; Approximation algorithms; Databases; Graphics processing units; Instruction sets; Monte Carlo methods; Probabilistic logic; Random variables; GPU-computing; Probabilistic databases; monte carlo; parallel algorithms; probabilistic reasoning; query processing;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2014.2324571
  • Filename
    6817579