• DocumentCode
    2497699
  • Title

    An Efficient Subsequences Mining Algorithm

  • Author

    Pan, Hongyan

  • Author_Institution
    Dept. of Comput. Sci., Zhejiang Bus. Technol. Inst., Ningbo, China
  • fYear
    2009
  • fDate
    11-13 June 2009
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    As a step forward to analyzing patterns in sequences, we introduce the problem of mining closed repetitive gapped subsequences and propose efficient solutions. Given a database of sequences where each sequence is an ordered list of events, the pattern we would like to mine is called repetitive gapped subsequence. Different from the sequential pattern mining problem, repetitive support captures not only repetitions of a pattern in different sequences but also the repetitions within a sequence. Given a users-specified support threshold min_sup, we study finding the set of all patterns with repetitive support no less than min_sup. To obtain a compact yet complete result set and improve the efficiency, we also study finding closed patterns. Efficient mining algorithms to find the complete set of desired patterns are proposed based on the idea of instance growth. Our performance study on various datasets shows the efficiency of our approach. A case study is also performed to show the utility of our approach.
  • Keywords
    DNA; biology computing; data mining; pattern clustering; proteins; closed patterns; mining algorithm; repetitive gapped subsequence; repetitive support; Computer science; Credit cards; Data mining; Databases; History; Information resources; Partial response channels; Pattern analysis; Sequences; Tin;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedical Engineering , 2009. ICBBE 2009. 3rd International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-2901-1
  • Electronic_ISBN
    978-1-4244-2902-8
  • Type

    conf

  • DOI
    10.1109/ICBBE.2009.5162317
  • Filename
    5162317