DocumentCode :
3461273
Title :
Developing an Efficient Knowledge Discovering Model for Mining Fuzzy Multi-level Sequential Patterns in Sequence Databases
Author :
Huang, T.C.-K.
Author_Institution :
Dept. of Bus. Adm., Nat. Chung Cheng Univ., Minsyong, Taiwan
fYear :
2009
fDate :
June 30 2009-July 2 2009
Firstpage :
362
Lastpage :
371
Abstract :
Sequential pattern mining from sequence databases has been recognized as an important data mining problem with various applications. Items in a sequence database can be organized into a concept hierarchy according to taxonomy. Based on the hierarchy, sequential patterns can be found not only at the leaf nodes (individual items) of the hierarchy, but also at higher levels of the hierarchy; this is called multiple level sequential pattern mining. In pervious research, taxonomies based on crisp relationships between any two disjoint levels, however, cannot handle the uncertainties and fuzziness in real life. For example, Tomatoes could be classified into the fruit category, but could be also regarded as the vegetable category. To deal with the fuzzy nature of taxonomy, Chen and Huang developed a novel knowledge discovering model to mine fuzzy multilevel sequential patterns, where the relationships from one level to another can be represented by a value between 0 and 1. In their work, a GSP-like algorithm was developed to find fuzzy multilevel sequential patterns. This algorithm, however, faces a difficult problem since the mining process may have to generate and examine a huge set of combinatorial subsequences and requires multiple scans of the database. In this paper, we propose a new efficient algorithm to mine this type of pattern based on the divide and conquer strategy. In addition, another efficient algorithm is developed to discover fuzzy cross level sequential patterns. Since the proposed algorithm greatly reduces the candidate subsequence generation efforts, the performance is improved significantly. Experiments show that the proposed algorithm is much more efficient and scalable than the previous one. In mining real-life databases, our works enhance the modelpsilas practicability and could promote more applications in business.
Keywords :
data mining; divide and conquer methods; fuzzy set theory; pattern classification; divide and conquer strategy; efficient knowledge discovering model; fuzzy cross level sequential pattern; fuzzy multilevel sequential pattern; hierarchy individual item; leaf node; mining process; sequence database; sequential pattern mining; taxonomy fuzzy nature; Dairy products; Data mining; Databases; Fuzzy sets; Information analysis; Knowledge management; Pattern recognition; Taxonomy; Test pattern generators; Uncertainty;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
New Trends in Information and Service Science, 2009. NISS '09. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-0-7695-3687-3
Type :
conf
DOI :
10.1109/NISS.2009.8
Filename :
5260749
Link To Document :
بازگشت