• DocumentCode
    2080239
  • Title

    Approximate confidence computation in probabilistic databases

  • Author

    Olteanu, Dan ; Huang, Jiewen ; Koch, Christoph

  • Author_Institution
    Oxford Univ. Comput. Lab., Oxford, UK
  • fYear
    2010
  • fDate
    1-6 March 2010
  • Firstpage
    145
  • Lastpage
    156
  • Abstract
    This paper introduces a deterministic approximation algorithm with error guarantees for computing the probability of propositional formulas over discrete random variables. The algorithm is based on an incremental compilation of formulas into decision diagrams using three types of decompositions: Shannon expansion, independence partitioning, and product factorization. With each decomposition step, lower and upper bounds on the probability of the partially compiled formula can be quickly computed and checked against the allowed error. This algorithm can be effectively used to compute approximate confidence values of answer tuples to positive relational algebra queries on general probabilistic databases (c-tables with discrete probability distributions). We further tune our algorithm so as to capture all known tractable conjunctive queries without self-joins on tuple-independent probabilistic databases: In this case, the algorithm requires time polynomial in the input size even for exact computation. We implemented the algorithm as an extension of the SPROUT query engine. An extensive experimental effort shows that it consistently outperforms state-of-art approximation techniques by several orders of magnitude.
  • Keywords
    approximation theory; probability; relational algebra; SPROUT query engine; Shannon expansion; approximate confidence computation; deterministic approximation algorithm; discrete probability distributions; discrete random variables; probabilistic databases; relational algebra queries; Algebra; Approximation algorithms; Distributed computing; Engines; Partitioning algorithms; Polynomials; Probability distribution; Random variables; Relational databases; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2010 IEEE 26th International Conference on
  • Conference_Location
    Long Beach, CA
  • Print_ISBN
    978-1-4244-5445-7
  • Electronic_ISBN
    978-1-4244-5444-0
  • Type

    conf

  • DOI
    10.1109/ICDE.2010.5447826
  • Filename
    5447826