• DocumentCode
    3189726
  • Title

    An Efficient Technique for Mining Approximately Frequent Substring Patterns

  • Author

    Ji, Xiaonan ; Bailey, James

  • Author_Institution
    Univ. of Melbourne, Melbourne
  • fYear
    2007
  • fDate
    28-31 Oct. 2007
  • Firstpage
    325
  • Lastpage
    330
  • Abstract
    Sequential patterns are used to discover knowledge in a wide range of applications. However, in many scenarios pattern quality can be low, due to short lengths or low supports. Furthermore, for dense datasets such as proteins, most of the sequential pattern mining algorithms return a tremendously large number of patterns, which are difficult to process and analyze. However, by relaxing the definition of frequency and allowing some mismatches, it is possible to discover higher quality patterns. We call these patterns Frequent Approximate Substrings or FAS-patterns and we introduce an algorithm called FAS-Miner, to handle the mining task efficiently. The experiments on real-world protein and DNA datasets show that FAS-Miner can discover patterns of much longer lengths and higher supports than standard sequential mining approaches.
  • Keywords
    DNA; biology computing; data mining; DNA datasets; FAS-Miner; FAS-patterns; approximately frequent substring pattern mining; frequent approximate substrings; knowledge discovery; pattern quality; sequential pattern mining algorithms; sequential patterns; Algorithm design and analysis; Application software; Computer science; Conferences; Data mining; Frequency; Laboratories; Pattern analysis; Protein engineering; Software engineering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on
  • Conference_Location
    Omaha, NE
  • Print_ISBN
    978-0-7695-3019-2
  • Electronic_ISBN
    978-0-7695-3033-8
  • Type

    conf

  • DOI
    10.1109/ICDMW.2007.121
  • Filename
    4476687