DocumentCode
467832
Title
A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Triming
Author
Chiou, Chuang-Kai ; Tseng, Judy C R
Author_Institution
Chung Hua Univ., Hsinchu
Volume
4
fYear
2007
fDate
19-22 Aug. 2007
Firstpage
2257
Lastpage
2262
Abstract
Apriori is an influential and well-known algorithm for mining association rules. However, the main drawback of Apriori algorithm is the large amount of candidate itemsets it generates. Several hash-based algorithms, such as DHP and MPIP, were proposed to deal with the problem. DHP employs hash functions to filter out potential-less candidate itemsets. MPIP further improves DHP by employing minimal perfect hashing functions to avoid generation of candidate itemsets. Though MPIP results in a very promising mining efficiency, the memory space required in MPIP increases rapidly as the number of items grows. To obtain even better mining efficiency while reducing the memory space required, a sorting-indexing-trimming (SIT) algorithm for mining association rules is proposed in this paper. SIT uses the sorting, indexing, and trimming techniques to reduce the amount of itemsets to be considered. Then, to utilize both the advantages of Ariori and MPIP, a revised MPIP algorithm is employed to deal with 2-itemsets, and a revised apriori algorithm to deal with Mtemsets for k>2. Though the memory space required in SIT is less than MPIP, from the experiment results, SIT outperforms both Apriori and MPIP.
Keywords
data mining; indexing; Apriori algorithm; hash-based algorithm; scalable association rule mining; sorting-indexing-trimming algorithm; Association rules; Computer science; Cybernetics; Data mining; Indexing; Itemsets; Machine learning; Machine learning algorithms; Sorting; Transaction databases; Apriori algorithm; Association rule; Data mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location
Hong Kong
Print_ISBN
978-1-4244-0973-0
Electronic_ISBN
978-1-4244-0973-0
Type
conf
DOI
10.1109/ICMLC.2007.4370521
Filename
4370521
Link To Document