• DocumentCode
    2690854
  • Title

    An efficient sequential pattern mining algorithm for motifs with gap constraints

  • Author

    Liao, Vance Chiang-Chi ; Chen, Ming-Syan

  • Author_Institution
    Dept. of Electr. Eng., Nat. Taiwan Univ., Taipei, Taiwan
  • fYear
    2012
  • fDate
    4-7 Oct. 2012
  • Firstpage
    1
  • Lastpage
    1
  • Abstract
    Mining biological data can provide insight into various realms of biology, such as finding co-occurring biosequences, which is essential for biological analyses and data mining. Sequential pattern mining reveals all-length implicit motifs, which have specific structures and are of functional significance in biological sequences. Traditional sequential pattern mining algorithms are inefficient for small alphabets and long sequences, such as DNA and protein sequences; therefore, it is necessary to move away from these algorithms. An approach called the Depth-First Spelling algorithm for mining sequential patterns (motifs) with Gap constraints in biological sequences (referred to as DFSG) is proposed in this work. In biological sequences, DFSG runtime is substantially shorter than that of GenPrefixSpan, where GenPrefixSpan is a method based on PrefixSpan (PrefixSpan is one of the fastest algorithms in traditional sequential pattern mining algorithms).
  • Keywords
    DNA; RNA; bioinformatics; data mining; molecular biophysics; molecular configurations; DFSG runtime; DNA sequences; biological analysis; biological data mining; biological sequences; depth-first spelling algorithm; functional significance; gap constraints; genprefixspan process; protein sequences; realms; traditional sequential pattern mining algorithms; Algorithm design and analysis; Classification algorithms; DNA; Data mining; Proteins; Runtime; data mining; sequential patterns;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    978-1-4673-2559-2
  • Electronic_ISBN
    978-1-4673-2558-5
  • Type

    conf

  • DOI
    10.1109/BIBM.2012.6392660
  • Filename
    6392660