DocumentCode
3249491
Title
Efficient progressive sampling for association rules
Author
Parthasarathy, Srinivasan
Author_Institution
Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA
fYear
2002
fDate
2002
Firstpage
354
Lastpage
361
Abstract
In data mining, sampling has often been suggested as an effective tool to reduce the size of the dataset operated at some cost to accuracy. However this loss to accuracy is often difficult to measure and characterize since the exact nature of the learning curve (accuracy vs. sample size) is parameter and data dependent, i.e., we do not know a priori what sample size is needed to achieve a desired accuracy on a particular dataset for a particular set of parameters. In this article we propose the use of progressive sampling, to determine the required sample size for association rule mining. We first show that a naive application of progressive sampling is not very efficient for association rule mining. We then present a refinement based on equivalence classes, that seems to work extremely well in practice and is able to converge to the desired sample size very quickly and very accurately. An additional novelty of our approach is the definition of a support-sensitive, interactive measure of accuracy across progressive samples.
Keywords
data mining; equivalence classes; fractals; association rules; data mining; dataset; equivalence classes; progressive sampling; rule mining; Association rules; Costs; Data mining; Databases; Delay; Information science; Loss measurement; Pressing; Sampling methods; Size measurement;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN
0-7695-1754-4
Type
conf
DOI
10.1109/ICDM.2002.1183923
Filename
1183923
Link To Document