DocumentCode :
1070595
Title :
On Subset Seeds for Protein Alignment
Author :
Roytberg, Mikhail ; Gambin, Anna ; Noé, Laurent ; Lasota, Slawomir ; Furletova, Eugenia ; Szczurek, Ewa ; Kucherov, Gregory
Author_Institution :
Inst. of Math. Problems in Biol., Pushchino, Russia
Volume :
6
Issue :
3
fYear :
2009
Firstpage :
483
Lastpage :
494
Abstract :
We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard BLASTP seeding method [2], [3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in BLASTP and vector seeds, our seeds show a similar or even better performance than BLASTP on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds versus BLASTP.
Keywords :
bioinformatics; molecular biophysics; proteins; search problems; BLOSUM62 matrix; Bernoulli model; protein alignment; protein database; protein sequences; seed alphabet; similarity search; standard BLASTP seeding method; subset seeds; vector seeds; Protein sequences; local alignment; multiple seeds; protein databases; seed alphabet; seeds; selectivity.; sensitivity; similarity search; subset seeds; Algorithms; Amino Acid Sequence; Amino Acids; Cluster Analysis; Computational Biology; Models, Biological; Proteins; ROC Curve; Sequence Alignment; Terminology as Topic;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2009.4
Filename :
4752807
Link To Document :
بازگشت