DocumentCode
477820
Title
Mining Sequential Pattern Using DF2Ls
Author
Yusheng, Xu ; Lanhui, Zhang ; Zhixin, Ma ; Lian, Li ; Chen, Xiaoyun ; Dillon, Tharam S.
Author_Institution
Sch. of Inf. Sci. & Technol., Lanzhou Univ., Lanzhou
Volume
2
fYear
2008
fDate
18-20 Oct. 2008
Firstpage
600
Lastpage
604
Abstract
In this paper, based on SEP and IEP proposed in our previous work, we present two novel pruning strategies, DSEP (dynamic sequence extension pruning) and DIEP (dynamic item extension pruning), which can be used in all Apriori-like sequence mining algorithms or lattice-theoretic approaches. DSEP/DIEP uses DF2Ls (Dynamic Frequent 2-Sequence Lists), which is built by previous enumerations, to prune out infrequent candidate sequences during mining process. With a little more memory overhead, proposed pruning strategies can prune invalidated search space and decrease the total cost of frequency counting effectively. For effectiveness testing reason, we optimize SPAM by using proposed pruning strategies and present the improved algorithm, SPAM+, which uses DSEP and DIEP to prune the search space of SPAM by sharing dynamic frequent 2-sequences lists. A comprehensive performance experiments study shows that SPAM+ outperforms SPAM by a factor of 10 on small datasets and better than 35% to 58% on reasonably large dataset.
Keywords
data mining; Apriori-like sequence mining algorithms; IEP; SEP; SPAM; dynamic frequent 2-sequence Lists; dynamic item extension pruning; dynamic sequence extension pruning; lattice theory; pruning strategies; sequential pattern mining; Costs; Data mining; Databases; Electronic mail; Frequency; Itemsets; Sequences; Space exploration; Testing; Unsolicited electronic mail;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location
Shandong
Print_ISBN
978-0-7695-3305-6
Type
conf
DOI
10.1109/FSKD.2008.29
Filename
4666187
Link To Document