Title :
Mining emerging substrings
Author :
Chan, Sarah ; Kao, Ben ; Yip, C.L. ; Tang, Michael
Author_Institution :
Dept. of Comput. Sci. & Inf. Syst., City Univ. of Hong Kong, China
Abstract :
We introduce a new type of KDD patterns called emerging substrings. In a sequence database, an emerging substring (ES) of a data class is a substring which occurs more frequently in that class rather than in other classes. ESs are important to sequence classification as they capture significant contrasts between data classes and provide insights for the construction of sequence classifiers. We propose a suffix tree-based framework for mining ESs, and study the effectiveness of applying one or more pruning techniques in different stages of our ES mining algorithm. Experimental results show that if the target class is of a small population with respect to the whole database, which is the normal scenario in single-class ES mining, most of the pruning techniques would achieve considerable performance gain.
Keywords :
data mining; pattern recognition; string matching; KDD patterns; contrasts; pruning techniques; sequence classification; sequence classifiers; sequence database; single-class emerging substring mining; suffix tree-based framework; Classification tree analysis; Companies; Computer science; Data mining; Databases; Electronic switching systems; Humans; Information systems; Partitioning algorithms; Performance gain;
Conference_Titel :
Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings. Eighth International Conference on
Conference_Location :
Kyoto, Japan
Print_ISBN :
0-7695-1895-8
DOI :
10.1109/DASFAA.2003.1192375