DocumentCode :
467832
Title :
A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Triming
Author :
Chiou, Chuang-Kai ; Tseng, Judy C R
Author_Institution :
Chung Hua Univ., Hsinchu
Volume :
4
fYear :
2007
fDate :
19-22 Aug. 2007
Firstpage :
2257
Lastpage :
2262
Abstract :
Apriori is an influential and well-known algorithm for mining association rules. However, the main drawback of Apriori algorithm is the large amount of candidate itemsets it generates. Several hash-based algorithms, such as DHP and MPIP, were proposed to deal with the problem. DHP employs hash functions to filter out potential-less candidate itemsets. MPIP further improves DHP by employing minimal perfect hashing functions to avoid generation of candidate itemsets. Though MPIP results in a very promising mining efficiency, the memory space required in MPIP increases rapidly as the number of items grows. To obtain even better mining efficiency while reducing the memory space required, a sorting-indexing-trimming (SIT) algorithm for mining association rules is proposed in this paper. SIT uses the sorting, indexing, and trimming techniques to reduce the amount of itemsets to be considered. Then, to utilize both the advantages of Ariori and MPIP, a revised MPIP algorithm is employed to deal with 2-itemsets, and a revised apriori algorithm to deal with Mtemsets for k>2. Though the memory space required in SIT is less than MPIP, from the experiment results, SIT outperforms both Apriori and MPIP.
Keywords :
data mining; indexing; Apriori algorithm; hash-based algorithm; scalable association rule mining; sorting-indexing-trimming algorithm; Association rules; Computer science; Cybernetics; Data mining; Indexing; Itemsets; Machine learning; Machine learning algorithms; Sorting; Transaction databases; Apriori algorithm; Association rule; Data mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-0973-0
Electronic_ISBN :
978-1-4244-0973-0
Type :
conf
DOI :
10.1109/ICMLC.2007.4370521
Filename :
4370521
Link To Document :
بازگشت