• DocumentCode
    897458
  • Title

    Domains and active domains: what this distinction implies for the estimation of projection sizes in relational databases

  • Author

    Ciaccia, Paolo ; Maio, Dario

  • Author_Institution
    Dipartimento di Elettronica, Inf. e Sistemistica, Bologna Univ., Italy
  • Volume
    7
  • Issue
    4
  • fYear
    1995
  • fDate
    8/1/1995 12:00:00 AM
  • Firstpage
    641
  • Lastpage
    655
  • Abstract
    Database optimizers require statistical information about data distributions in order to evaluate result sizes and access plan costs for processing user queries. In this context, we consider the problem of estimating the size of the projections of a database relation, when measures on attribute domain cardinalities are maintained in the system. Our main theoretical contribution is a new formal model, the AD (active domain) model, which is valid under the hypotheses of attribute independence and uniform distribution of attribute values, derived considering the difference between the time-invariant domain (the set of values that an attribute can assume) and the time-dependent (“active”) domain (the set of values that are actually assumed, at a certain time). Early models developed under the same assumptions are shown to be formally incorrect. Since the AD model is computationally highly demanding, we also introduce an approximate, easy-to-compute model, the A2D (approximate active domain) model that, unlike previous approximations, yields low errors on all the parameter space of the active domain cardinalities. Finally, we extend the A2D model to the case of nonuniform distributions and present experimental results confirming the good behavior of the model
  • Keywords
    active databases; database theory; error statistics; query processing; relational databases; A2D model; AD model; active domains; approximate active domain model; attribute domain cardinalities; attribute independence; combinatorial models; data distributions; database optimizers; error estimate; nonuniform distributions; parameter space errors; plan costs; projection size estimation; query optimization; relational databases; statistical information; statistical profile; time-dependent domain; time-invariant domain; uniform attribute values distribution; user query processing; Aggregates; Computational modeling; Cost function; Estimation error; Histograms; Query processing; Relational databases; Size measurement;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/69.404035
  • Filename
    404035