Title :
Pattern Matching with Wildcard Gaps Based on Cross List
Author :
Junyan Zhang ; Chenhui Yang
Author_Institution :
Key Lab. of Pattern Recognition & Intell. Inf. Process. of Sichuan, Chengdu Univ., Chengdu, China
Abstract :
Pattern matching is a fundamental application text retrieval, string query, biological sequence analysis, etc. Therefore, the effective algorithm performing this kind of matching is in great need. In this paper, the wildcard is defines to match any one character in a sequence. Multiple wildcards form a gap. The length of a flexible gap is arbitrary. We design CLPM algorithm by use of cross list index structure to realize pattern matching with flexible wildcard gaps. The preprocessing algorithm is designed to initialize cross list so as to reduce searching space. In CLPM algorithm, the effective intervals is defined and computed based on the start positions of each sub pattern in each string, which help to obtain matching result set. Moreover, the approximate pattern matching is converted to short extract pattern matching. The contrast experiments are done based on DBLP tile data set. The results show that CLMP algorithm has better performance in the same fields.
Keywords :
data structures; string matching; CLPM algorithm design; DBLP tile data set; arbitrary flexible wildcard gap length; cross-list index structure; pattern matching approximation; preprocessing algorithm; search space reduction; sequence character matching; subpattern start positions; Algorithm design and analysis; Approximation algorithms; Educational institutions; Indexes; Pattern matching; Presses; Silicon; cross list; pattern matching; wildcard gaps;
Conference_Titel :
Computational Intelligence and Design (ISCID), 2013 Sixth International Symposium on
Conference_Location :
Hangzhou
DOI :
10.1109/ISCID.2013.152