• DocumentCode
    467832
  • Title

    A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Triming

  • Author

    Chiou, Chuang-Kai ; Tseng, Judy C R

  • Author_Institution
    Chung Hua Univ., Hsinchu
  • Volume
    4
  • fYear
    2007
  • fDate
    19-22 Aug. 2007
  • Firstpage
    2257
  • Lastpage
    2262
  • Abstract
    Apriori is an influential and well-known algorithm for mining association rules. However, the main drawback of Apriori algorithm is the large amount of candidate itemsets it generates. Several hash-based algorithms, such as DHP and MPIP, were proposed to deal with the problem. DHP employs hash functions to filter out potential-less candidate itemsets. MPIP further improves DHP by employing minimal perfect hashing functions to avoid generation of candidate itemsets. Though MPIP results in a very promising mining efficiency, the memory space required in MPIP increases rapidly as the number of items grows. To obtain even better mining efficiency while reducing the memory space required, a sorting-indexing-trimming (SIT) algorithm for mining association rules is proposed in this paper. SIT uses the sorting, indexing, and trimming techniques to reduce the amount of itemsets to be considered. Then, to utilize both the advantages of Ariori and MPIP, a revised MPIP algorithm is employed to deal with 2-itemsets, and a revised apriori algorithm to deal with Mtemsets for k>2. Though the memory space required in SIT is less than MPIP, from the experiment results, SIT outperforms both Apriori and MPIP.
  • Keywords
    data mining; indexing; Apriori algorithm; hash-based algorithm; scalable association rule mining; sorting-indexing-trimming algorithm; Association rules; Computer science; Cybernetics; Data mining; Indexing; Itemsets; Machine learning; Machine learning algorithms; Sorting; Transaction databases; Apriori algorithm; Association rule; Data mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2007 International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    978-1-4244-0973-0
  • Electronic_ISBN
    978-1-4244-0973-0
  • Type

    conf

  • DOI
    10.1109/ICMLC.2007.4370521
  • Filename
    4370521