Title :
Overestimation for Multiple Sequence Alignment
Author :
Cazenave, Tristan
Author_Institution :
Dept. Informatique, Univ. Paris 8
Abstract :
Multiple sequence alignment is an important problem in computational biology. A-star is an algorithm that can be used to find exact alignments. We present a simple modification of the A-star algorithm that improves much multiple sequence alignment, both in time and memory, at the cost of a small accuracy loss. It consists in overestimating the admissible heuristic. A typical speedup for random sequences of length two hundred fifty is 47 associated to a memory gain of 13 with an error rate of 0.09%. Concerning real sequences, the speedup can be greater than 20,000 and the memory gain greater than 150, the error rate being in the range from 0.08% to 0.67% for the sequences we have tested. Overestimation can align sequences that are not possible to align with the exact algorithm
Keywords :
biology computing; sequences; A-star algorithm; computational biology; multiple sequence alignment; overestimation; random sequences; Bioinformatics; Computational biology; Computational intelligence; Costs; DNA; Dynamic programming; Error analysis; Lattices; Random sequences; Testing;
Conference_Titel :
Computational Intelligence and Bioinformatics and Computational Biology, 2007. CIBCB '07. IEEE Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0710-9
DOI :
10.1109/CIBCB.2007.4221218