DocumentCode
1990761
Title
QOMA2: Optimizing the alignment of many sequences
Author
Zhang, Xu ; Kahveci, Tamer
Author_Institution
Univ. of Florida, Gainesville
fYear
2007
fDate
14-17 Oct. 2007
Firstpage
780
Lastpage
787
Abstract
We consider the problem of aligning multiple protein sequences with the goal of maximizing the SP (sum-of-pairs) score, when the number of sequences is large. The QOMA (quasi-optimal multiple alignment) algorithm addressed this problem when the number of sequences is small. However, as the number of sequences increases, QOMA becomes impractical. This paper develops a new algorithm, QOMA2, which optimizes the SP score of the alignment of arbitrarily large number of sequences. Given an initial (potentially sub-optimal) alignment , QOMA2 selects short subsequences from this alignment by placing a window on it. It quickly estimates the amount of improvement that can be obtained by optimizing the alignment of the subsequences in short windows on this alignment. This estimate is called the SW (sum of weights) score. It employs a dynamic programming algorithm that selects the set of window positions with the largest total expected improvement. It partitions the subsequences within each window into clusters such that the number of subsequences in each cluster is small enough to be optimally aligned within a given time. Also, it aims to select these clusters so that the optimal alignment of the subsequences in these clusters produces the highest expected SP score. The experimental results show that QOMA2 produces high SP scores quickly even for large number of sequences. They also show that the SW score and the resulting SP score are highly correlated. This implies that it is promising to aim for optimizing the SW score since it is much cheaper than aligning multiple sequences optimally. The software and the benchmark data set are available from the authors on request.
Keywords
biology computing; dynamic programming; molecular biophysics; molecular configurations; proteins; QOMA2; dynamic programming; optimization; protein sequences; quasi-optimal multiple alignment; sequence alignment; Clustering algorithms; Computational biology; Dynamic programming; Heuristic algorithms; Information science; NP-complete problem; Partitioning algorithms; Phylogeny; Protein engineering; Sequences;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
Conference_Location
Boston, MA
Print_ISBN
978-1-4244-1509-0
Type
conf
DOI
10.1109/BIBE.2007.4375649
Filename
4375649
Link To Document