• DocumentCode
    3249491
  • Title

    Efficient progressive sampling for association rules

  • Author

    Parthasarathy, Srinivasan

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    354
  • Lastpage
    361
  • Abstract
    In data mining, sampling has often been suggested as an effective tool to reduce the size of the dataset operated at some cost to accuracy. However this loss to accuracy is often difficult to measure and characterize since the exact nature of the learning curve (accuracy vs. sample size) is parameter and data dependent, i.e., we do not know a priori what sample size is needed to achieve a desired accuracy on a particular dataset for a particular set of parameters. In this article we propose the use of progressive sampling, to determine the required sample size for association rule mining. We first show that a naive application of progressive sampling is not very efficient for association rule mining. We then present a refinement based on equivalence classes, that seems to work extremely well in practice and is able to converge to the desired sample size very quickly and very accurately. An additional novelty of our approach is the definition of a support-sensitive, interactive measure of accuracy across progressive samples.
  • Keywords
    data mining; equivalence classes; fractals; association rules; data mining; dataset; equivalence classes; progressive sampling; rule mining; Association rules; Costs; Data mining; Databases; Delay; Information science; Loss measurement; Pressing; Sampling methods; Size measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
  • Print_ISBN
    0-7695-1754-4
  • Type

    conf

  • DOI
    10.1109/ICDM.2002.1183923
  • Filename
    1183923