DocumentCode :
3582106
Title :
Efficient multi-word parameterized matching on compressed text
Author :
Prasad, Rajesh ; Garg, Rama
Author_Institution :
Dept. of Comput. Sci., Yobe State Univ., Damaturu, Nigeria
fYear :
2014
Firstpage :
1
Lastpage :
6
Abstract :
Searching set of patterns {P1, P2, P3, ....Pr}, r≥1, inside body of a text T[1...n] is called multi-pattern matching problem. This matching is said to be parameterized match (p-match), if one can be transformed into the other via some bijective mapping. It is mainly used in software maintenance, plagiarism detection and detecting isomorphism in a graph. In the compressed parameterized matching problem, our task is to find all the parameterized occurrences of a pattern (set of patterns) in the compressed text, without decompressing it. Compressing the text before matching reduces the size and minimizes the matching time also. In this paper, we develop an efficient algorithm for parameterized multi-word matching problem on the compressed text, where both patterns and text are compressed before actual matching is performed and pattern is treated as word. For compressing the pattern and text, we use efficient compression code: Word Based Tagged Code (WBTC) and bit-parallel algorithm is used for searching purpose. Experimental results show that our algorithm is up to three times faster than the search on the uncompressed text.
Keywords :
data compression; parallel algorithms; string matching; text analysis; WBTC; bit-parallel algorithm; compressed text; compression code; parameterized multiword matching problem; word based tagged code; Algorithm design and analysis; Automata; Image coding; Indexes; Pattern matching; Vocabulary; Compressed parameterized matching; String matching; compressed pattern matching; information retrieval; multiple matching and word based tagged code;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Adaptive Science & Technology (ICAST), 2014 IEEE 6th International Conference on
Type :
conf
DOI :
10.1109/ICASTECH.2014.7068138
Filename :
7068138
Link To Document :
بازگشت