مرکز منطقه ای اطلاع رساني علوم و فناوري - Hard problems in similarity searching Original Research Article

Abstract :

The Closest Substring Problem is one of the most important problems in the field of computational biology. It is stated as follows: given a set of t sequences s1,s2,…,st over an alphabet Σ, and two integers k,d with d⩽k, can one find a string s of length k and, for all i=1,2,…,t, substrings oi of si, all of length k, such that d(s,oi)⩽d (for all i=1,2,…,t)? (here, d(.,.) represents the Hamming distance). Closest Substring was shown to be NP-hard (Proceedings of 10th SODA, 1999, pp. 633–642) and W[1]-hard with respect to the number t of input sequences (Proceedings of STACS’02, Lecture Notes in Computer Science, Vol. 2285, 2002, pp. 262–273); recently, an important number of results concerning the parameterized computational complexity of Closest Substring has been added in Evans et al. (Theoret. Comput. Sci. 306 (1–3) (2003) 407). In this paper we introduce and analyze two variants of the Closest Substring Problem, obtained by imposing restrictions on the pairwise distances between the substrings oi: