Overestimation for Multiple Sequence Alignment

Author

Cazenave, Tristan

Author_Institution

Dept. Informatique, Univ. Paris 8

fYear

2007

fDate

1-5 April 2007

Firstpage

159

Lastpage

164

Abstract

Multiple sequence alignment is an important problem in computational biology. A-star is an algorithm that can be used to find exact alignments. We present a simple modification of the A-star algorithm that improves much multiple sequence alignment, both in time and memory, at the cost of a small accuracy loss. It consists in overestimating the admissible heuristic. A typical speedup for random sequences of length two hundred fifty is 47 associated to a memory gain of 13 with an error rate of 0.09%. Concerning real sequences, the speedup can be greater than 20,000 and the memory gain greater than 150, the error rate being in the range from 0.08% to 0.67% for the sequences we have tested. Overestimation can align sequences that are not possible to align with the exact algorithm

Keywords

biology computing; sequences; A-star algorithm; computational biology; multiple sequence alignment; overestimation; random sequences; Bioinformatics; Computational biology; Computational intelligence; Costs; DNA; Dynamic programming; Error analysis; Lattices; Random sequences; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Computational Intelligence and Bioinformatics and Computational Biology, 2007. CIBCB '07. IEEE Symposium on

Conference_Location

Honolulu, HI

Print_ISBN

1-4244-0710-9

Type

conf

DOI

10.1109/CIBCB.2007.4221218

Filename

4221218