DocumentCode :
3305677
Title :
Novel algorithms for computing medians and other quantiles of disk-resident data
Author :
Fu, Lixin ; Rajasekaran, Sanguthevar
Author_Institution :
Dept. of Comput. & Inf. Sci., Florida Univ., Gainesville, FL, USA
fYear :
2001
fDate :
2001
Firstpage :
145
Lastpage :
154
Abstract :
In data warehousing applications, numerous OLAP queries involve the processing of holistic operations such as computing the “top N”, median, etc. Efficient implementations of these operations are hard to come by. Several algorithms have been proposed in the literature that estimate various quantiles of disk-resident data. Two such recent algorithms are based on sampling. The authors present two novel and efficient quantiling algorithms, Deterministic Bucketing (DB) and Randomized Bucketing (RB). We have analyzed the performance of DB and RE and extended the analysis of the sampling done in prior algorithms. We have conducted extensive experiments to compare all these four algorithms. Our experimental data indicate that our new algorithms outperform prior algorithms not only in the overall run time but also in accuracy. The new algorithms can be used either as one-pass algorithms to accurately estimate quantiles or as algorithms for computing the quantiles exactly
Keywords :
data mining; data warehouses; sampling methods; Deterministic Bucketing; OLAP queries; Randomized Bucketing; data warehousing applications; disk-resident data; holistic operations; medians; one-pass algorithms; quantiles; quantiling algorithms; sampling; Algorithm design and analysis; Partitioning algorithms; Performance analysis; Phase change random access memory; Sampling methods; Sorting; Upper bound; Warehousing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database Engineering and Applications, 2001 International Symposium on.
Conference_Location :
Grenoble
Print_ISBN :
0-7695-1140-6
Type :
conf
DOI :
10.1109/IDEAS.2001.938081
Filename :
938081
Link To Document :
بازگشت