• DocumentCode
    2456422
  • Title

    A General Method for Estimating Correlated Aggregates over a Data Stream

  • Author

    Tirthapura, Srikanta ; Woodruff, David P.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
  • fYear
    2012
  • fDate
    1-5 April 2012
  • Firstpage
    162
  • Lastpage
    173
  • Abstract
    On a stream of two dimensional data items (x,y) where x is an item identifier, and y is a numerical attribute, a correlated aggregate query requires us to first apply a selection predicate along the second (y) dimension, followed by an aggregation along the first (x) dimension. For selection predicates of the form (y <; c) or (y >;, c), where parameter c is provided at query time, we present new streaming algorithms and lower bounds for estimating statistics of the resulting sub stream of elements that satisfy the predicate. We provide the first sub linear space algorithms for a large family of statistics in this model, including frequency moments. We experimentally validate our algorithms, showing that their memory requirements are significantly smaller than existing linear storage schemes for large datasets, while simultaneously achieving fast per-record processing time. We also study the problem when the items have weights. Allowing negative weights allows for analyzing values which occur in the symmetric difference of two datasets. We give a strong space lower bound which holds even if the algorithm is allowed up to a logarithmic number of passes over the data(before the query is presented). We complement this with a small space algorithm which uses a logarithmic number of passes.
  • Keywords
    data handling; query processing; statistics; correlated aggregate estimation; correlated aggregate query; frequency moments; general method; item identifier; linear storage schemes; negative weights; numerical attribute; pass logarithmic number; selection predicate; statistics estimation; streaming algorithms; sublinear space algorithms; two dimensional data item stream; Aggregates; Complexity theory; Data structures; Estimation; Frequency estimation; Silicon; USA Councils;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2012 IEEE 28th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4673-0042-1
  • Type

    conf

  • DOI
    10.1109/ICDE.2012.62
  • Filename
    6228081