Title :
Efficient weighted mining of repetitive subsequences
Author :
Lee, EunJu ; Kim, Wonyoung ; Ryu, Joonsuk ; Kim, Ungmo
Author_Institution :
Dept. of Comput. Eng., Sungkyunkwan Univ., Suwon, South Korea
Abstract :
Recently sequential pattern has become an important research with broad applications. The task discovering frequent subsequences in sequence database is very worth. However, a frequent long sequence pattern, contains a combinatorial number of frequent subsequences, mining will generate an exponential number of frequent subsequences for long patterns, which is excessively expensive in both time and space. A more practical and scalable alternative is required which discovery of subsequential pattern. If a pattern has repetitive subsequences in a sequence, each subsequence must distinguish due to different weight in the pattern. For this reason, we need to gap weight for mining of repetitive subsequence. As yet, no subsequential pattern mining though gap weights are very important in the real world. We can mine the weighted mining of repetitive subsequences with gap weights. In the paper, we propose an algorithm, EWM (Efficient Gap-Weighted Mining), for the problem of mining repetitive subsequences. The EWM can address situations where distinguish between same subsequences. Furthermore, we introduce the concept of gap weight for subsequences which have different gap between events. To this end, we define and use a new type of database to represent sequence data efficiently. The EWM for the discovery of all subsequence patterns may lose information but is both efficient and scalable when pruning infrequent subsequences and discovering ordered subsequential patterns.
Keywords :
combinatorial mathematics; data mining; number theory; sequences; combinatorial frequent subsequence number; efficient gap-weighted mining; exponential number; frequent long sequence pattern; sequence database; weighted repetitive subsequence mining; Application software; Bioinformatics; Credit cards; DNA; Genetic mutations; Genomics; History; Pattern analysis; Sequences; Spatial databases;
Conference_Titel :
Web Society, 2009. SWS '09. 1st IEEE Symposium on
Conference_Location :
Lanzhou
Print_ISBN :
978-1-4244-4157-0
Electronic_ISBN :
978-1-4244-4158-7
DOI :
10.1109/SWS.2009.5271708