Title :
InfoMiner+: mining partial periodic patterns with gap penalties
Author :
Yang, Jiong ; Wang, Wei ; Yu, Philip S.
Abstract :
In this paper we focus on mining periodic patterns allowing some degree of imperfection in the form of random replacement from a perfect periodic pattern. Information gain was proposed to identify patterns with events of vastly different occurrence frequencies and adjust for deviation from a pattern. However, it does not involve a penalty if there exists some gap between pattern occurrences. In many applications, e.g., bioinformatics, it is important to identify subsequences that a pattern repeats perfectly (or near perfectly). As a solution, we extend the information gain measure to include a penalty for gaps between pattern occurrences. We call this measure generalized information gain. Furthermore, we need to find a subsequence S´ such that for a pattern P, the generalized information gain of P in S´ is high. This is particularly useful in locating repeats in DNA sequences. In this paper, we developed an effective mining algorithm, InfoMiner+, to simultaneously mine significant patterns and associated subsequences.
Keywords :
DNA; biology computing; data mining; sequences; time series; DNA sequence repeat location; InfoMiner+; gap penalties; generalized information gain; imperfection; information gain measure; partial periodic pattern mining; pattern occurrences; random replacement; subsequences; Aggregates; Association rules; DNA; Data mining; Frequency; Gain measurement; Scattering; Sequences; Time measurement;
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
DOI :
10.1109/ICDM.2002.1184039