Title :
Estimating Aggregates over Multiple Sets
Author :
Cohen, Edith ; Kaplan, Haim
Author_Institution :
AT&T Labs.-Res., Florham Park, NJ
Abstract :
Many datasets, including market basket data, text or hypertext documents, and measurement data collected in different nodes or time periods, are modeled as a collection of sets over a ground set of (weighted) items. We consider the problem of estimating basic aggregates such as the weight or selectivity of a subpopulation of the items. We extend classic summarization techniques based on sampling to this scenario when we have multiple sets and selection predicates based on membership in particular sets.
Keywords :
document handling; hypermedia; hypertext documents; market basket data; measurement data collected; multiple sets; summarization techniques; Aggregates; Computer science; Content based retrieval; Costs; Data mining; Frequency; Sampling methods; Time measurement; USA Councils; Web pages; approximate query processing; sampling; similarity; sketching;
Conference_Titel :
Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3502-9
DOI :
10.1109/ICDM.2008.110