DocumentCode :
1966416
Title :
A new heuristic for multiple sequence alignment
Author :
Agrawal, Ankit ; Khaitan, Siddhartha Kumar
Author_Institution :
Dept. of Comput. Sci., Iowa State Univ., Ames, IA
fYear :
2008
fDate :
18-20 May 2008
Firstpage :
215
Lastpage :
217
Abstract :
Multiple sequence alignment is an important task in bioinformatics which forms the basis of many other tasks like protein structure prediction, protein function prediction and phylogenetic analysis. An optimal multiple sequence alignment for a given set of sequences is difficult to determine by standard dynamic programming algorithms because of impractically high computational complexity. Therefore, heuristics are commonly used to reduce the alignment construction time, even though the resulting alignment may not be optimal. ClustalW is one of the most popular progressive multiple sequence alignment algorithm where the pairwise sequence alignments are done according to a static guide-tree based on the sequences alone. For a multiple sequence alignment of N sequences, each of length O(n), the ClustalW algorithm takes O(N2n2) time. In this paper, a modification to the algorithm is proposed which reduces the time complexity of the algorithm to O(N log2 Nn2). The proposed heuristic also makes the alignment process more dynamic as the order of sequences added to the multiple sequence alignment also depends on the already computed multiple sequence alignments.
Keywords :
biology computing; computational complexity; trees (mathematics); ClustalW algorithm; bioinformatics; multiple sequence alignment algorithm; pairwise sequence alignment; static guide-tree; Bioinformatics; Computational complexity; Computer science; Databases; Dynamic programming; Heuristic algorithms; Iterative algorithms; Iterative methods; Phylogeny; Protein engineering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electro/Information Technology, 2008. EIT 2008. IEEE International Conference on
Conference_Location :
Ames, IA
Print_ISBN :
978-1-4244-2029-2
Electronic_ISBN :
978-1-4244-2030-8
Type :
conf
DOI :
10.1109/EIT.2008.4554299
Filename :
4554299
Link To Document :
بازگشت