• DocumentCode
    2839676
  • Title

    Approximate Repeating Pattern Mining with Gap Requirements

  • Author

    He, Dan ; Zhu, Xingquan ; Wu, Xindong

  • Author_Institution
    Dept. of Comput. Sci., Univ. of California Los Angeles, Los Angeles, CA, USA
  • fYear
    2009
  • fDate
    2-4 Nov. 2009
  • Firstpage
    17
  • Lastpage
    24
  • Abstract
    In this paper, we define a new research problem for mining approximate repeating patterns (ARP) with gap constraints, where the appearance of a pattern is subject to an approximate matching, which is very common in biological sciences. To solve the problem, we propose an ArpGap (Approximate repeating pattern mining with Gap constraints) algorithm with three major components for approximate repeating pattern mining: (1) a data-driven pattern generation approach to avoid generating unnecessary patterns; (2) a back-tracking pattern search process to discover approximate occurrences of a pattern under gap constraints; and (3) an Apriori-like deterministic pruning approach to progressively prune patterns and cease the search process if necessary. Experimental results on synthetic and real-world protein sequences assert that ArpGap is efficient in terms of memory consumption and computational cost.
  • Keywords
    data mining; pattern matching; search problems; Apriori-like deterministic pruning approach; ArpGap algorithm; approximate pattern matching; approximate repeating pattern mining; back-tracking pattern search process; biological sciences; data-driven pattern generation approach; memory consumption; real-world protein sequences; search process; Artificial intelligence; Australia; Biology; Computational efficiency; Computer science; Helium; Pattern matching; Proteins; Sequences; USA Councils; Back-Tracking; Dynamic Programming; Gap Requirements; Pattern Mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence, 2009. ICTAI '09. 21st International Conference on
  • Conference_Location
    Newark, NJ
  • ISSN
    1082-3409
  • Print_ISBN
    978-1-4244-5619-2
  • Electronic_ISBN
    1082-3409
  • Type

    conf

  • DOI
    10.1109/ICTAI.2009.8
  • Filename
    5364679