• DocumentCode
    3251546
  • Title

    InfoMiner+: mining partial periodic patterns with gap penalties

  • Author

    Yang, Jiong ; Wang, Wei ; Yu, Philip S.

  • fYear
    2002
  • fDate
    2002
  • Firstpage
    725
  • Lastpage
    728
  • Abstract
    In this paper we focus on mining periodic patterns allowing some degree of imperfection in the form of random replacement from a perfect periodic pattern. Information gain was proposed to identify patterns with events of vastly different occurrence frequencies and adjust for deviation from a pattern. However, it does not involve a penalty if there exists some gap between pattern occurrences. In many applications, e.g., bioinformatics, it is important to identify subsequences that a pattern repeats perfectly (or near perfectly). As a solution, we extend the information gain measure to include a penalty for gaps between pattern occurrences. We call this measure generalized information gain. Furthermore, we need to find a subsequence S´ such that for a pattern P, the generalized information gain of P in S´ is high. This is particularly useful in locating repeats in DNA sequences. In this paper, we developed an effective mining algorithm, InfoMiner+, to simultaneously mine significant patterns and associated subsequences.
  • Keywords
    DNA; biology computing; data mining; sequences; time series; DNA sequence repeat location; InfoMiner+; gap penalties; generalized information gain; imperfection; information gain measure; partial periodic pattern mining; pattern occurrences; random replacement; subsequences; Aggregates; Association rules; DNA; Data mining; Frequency; Gain measurement; Scattering; Sequences; Time measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
  • Print_ISBN
    0-7695-1754-4
  • Type

    conf

  • DOI
    10.1109/ICDM.2002.1184039
  • Filename
    1184039