• DocumentCode
    1983990
  • Title

    Pattern Matching with Wildcard Gaps Based on Cross List

  • Author

    Junyan Zhang ; Chenhui Yang

  • Author_Institution
    Key Lab. of Pattern Recognition & Intell. Inf. Process. of Sichuan, Chengdu Univ., Chengdu, China
  • Volume
    2
  • fYear
    2013
  • fDate
    28-29 Oct. 2013
  • Firstpage
    154
  • Lastpage
    156
  • Abstract
    Pattern matching is a fundamental application text retrieval, string query, biological sequence analysis, etc. Therefore, the effective algorithm performing this kind of matching is in great need. In this paper, the wildcard is defines to match any one character in a sequence. Multiple wildcards form a gap. The length of a flexible gap is arbitrary. We design CLPM algorithm by use of cross list index structure to realize pattern matching with flexible wildcard gaps. The preprocessing algorithm is designed to initialize cross list so as to reduce searching space. In CLPM algorithm, the effective intervals is defined and computed based on the start positions of each sub pattern in each string, which help to obtain matching result set. Moreover, the approximate pattern matching is converted to short extract pattern matching. The contrast experiments are done based on DBLP tile data set. The results show that CLMP algorithm has better performance in the same fields.
  • Keywords
    data structures; string matching; CLPM algorithm design; DBLP tile data set; arbitrary flexible wildcard gap length; cross-list index structure; pattern matching approximation; preprocessing algorithm; search space reduction; sequence character matching; subpattern start positions; Algorithm design and analysis; Approximation algorithms; Educational institutions; Indexes; Pattern matching; Presses; Silicon; cross list; pattern matching; wildcard gaps;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Design (ISCID), 2013 Sixth International Symposium on
  • Conference_Location
    Hangzhou
  • Type

    conf

  • DOI
    10.1109/ISCID.2013.152
  • Filename
    6804851