DocumentCode
3251546
Title
InfoMiner+: mining partial periodic patterns with gap penalties
Author
Yang, Jiong ; Wang, Wei ; Yu, Philip S.
fYear
2002
fDate
2002
Firstpage
725
Lastpage
728
Abstract
In this paper we focus on mining periodic patterns allowing some degree of imperfection in the form of random replacement from a perfect periodic pattern. Information gain was proposed to identify patterns with events of vastly different occurrence frequencies and adjust for deviation from a pattern. However, it does not involve a penalty if there exists some gap between pattern occurrences. In many applications, e.g., bioinformatics, it is important to identify subsequences that a pattern repeats perfectly (or near perfectly). As a solution, we extend the information gain measure to include a penalty for gaps between pattern occurrences. We call this measure generalized information gain. Furthermore, we need to find a subsequence S´ such that for a pattern P, the generalized information gain of P in S´ is high. This is particularly useful in locating repeats in DNA sequences. In this paper, we developed an effective mining algorithm, InfoMiner+, to simultaneously mine significant patterns and associated subsequences.
Keywords
DNA; biology computing; data mining; sequences; time series; DNA sequence repeat location; InfoMiner+; gap penalties; generalized information gain; imperfection; information gain measure; partial periodic pattern mining; pattern occurrences; random replacement; subsequences; Aggregates; Association rules; DNA; Data mining; Frequency; Gain measurement; Scattering; Sequences; Time measurement;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN
0-7695-1754-4
Type
conf
DOI
10.1109/ICDM.2002.1184039
Filename
1184039
Link To Document