DocumentCode :
3098119
Title :
Improving the efficiency of interactive sequential pattern mining by incremental pattern discovery
Author :
Lin, Ming-Yen ; Lee, Suh-Yin
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., National Chiao Tung Univ., Taiwan, China
fYear :
2003
fDate :
6-9 Jan. 2003
Abstract :
The discovery of sequential patterns, which extends beyond frequent item-set finding of association rule mining, has become a challenging task due to its complexity. Essentially, a user would specify a minimum support threshold with respect to the database to find out the desired patterns. The mining process is usually iterative since the user must try various thresholds to obtain the satisfactory result. Therefore, the time-consuming process has to be repeated several times. However, current approaches are inadequate for such process due to the long execution time required for each trial. In order to minimize the total execution time and the response time for each trial, we propose a knowledge base assisted algorithm for interactive sequence discovery, called KISP. KISP constructs a knowledge base accumulating the pattern information in individual mining, eliminates considerable amount of potential patterns to facilitate efficient support counting, and speeds up the whole process. In addition, we further optimize the algorithm by direct generations of the reduced candidate sets and concurrent counting of variable sized candidates. For some queries, KISP may eliminate database access completely. The conducted experiments show that KISP outperforms GSP, a state-of-the-art sequence mining algorithm, by several orders of magnitudes for interactive sequence discovery.
Keywords :
data mining; database theory; pattern matching; sequences; temporal databases; KISP; association rule mining; data mining; incremental pattern discovery; interactive sequence discovery; knowledge base assisted algorithm; sequential pattern discovery; sequential pattern mining; Association rules; Computer science; Data analysis; Data mining; Databases; Delay; Diseases; Electronic mail; Itemsets; Time factors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
System Sciences, 2003. Proceedings of the 36th Annual Hawaii International Conference on
Print_ISBN :
0-7695-1874-5
Type :
conf
DOI :
10.1109/HICSS.2003.1173921
Filename :
1173921
Link To Document :
بازگشت