Title :
Generalised Implementation for Fixed-Length Approximate String Matching under Hamming Distance and Applications
Author :
Solon Pissis;Ahmad Retha
Author_Institution :
Dept. of Inf., King´s Coll. London, London, UK
fDate :
5/1/2015 12:00:00 AM
Abstract :
Approximate string matching is the problem of finding all factors of a text t of length n with a distance at most k from a pattern x of length m ≤ n. Fixed-length approximate string matching is the problem of finding all factors of a text t of length n with a distance at most k from any factor of length h of a pattern x of length m ≤ n, where h ≤ m. It is thus a generalisation of approximate string matching and has numerous direct applications in molecular biology-motif extraction and circular sequence alignment, to name a few. MaxShiftM is a bit-parallel algorithm for fixed-length approximate string matching under the Hamming distance model with time complexity O(m⌈h/w⌉n), where w is the size of the computer word (Crochemore et al., 2010). An implementation of this algorithm is straightforward as long as the maximal length of alignments is less than or equal to w. In this article, our contribution is twofold: first, we propose a generalised implementation of MaxShiftM, that is, for any given h ≤ m, under the Hamming distance model; and second, we show how our implementation can be used to improve the accuracy and efficiency of multiple circular sequence alignment in terms of the inferred likelihood-based phylogenies.
Keywords :
"Approximation algorithms","Heuristic algorithms","Pattern matching","Hamming distance","Computers","Dynamic programming","DNA"
Conference_Titel :
Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International
DOI :
10.1109/IPDPSW.2015.106