• DocumentCode
    1070595
  • Title

    On Subset Seeds for Protein Alignment

  • Author

    Roytberg, Mikhail ; Gambin, Anna ; Noé, Laurent ; Lasota, Slawomir ; Furletova, Eugenia ; Szczurek, Ewa ; Kucherov, Gregory

  • Author_Institution
    Inst. of Math. Problems in Biol., Pushchino, Russia
  • Volume
    6
  • Issue
    3
  • fYear
    2009
  • Firstpage
    483
  • Lastpage
    494
  • Abstract
    We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard BLASTP seeding method [2], [3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in BLASTP and vector seeds, our seeds show a similar or even better performance than BLASTP on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds versus BLASTP.
  • Keywords
    bioinformatics; molecular biophysics; proteins; search problems; BLOSUM62 matrix; Bernoulli model; protein alignment; protein database; protein sequences; seed alphabet; similarity search; standard BLASTP seeding method; subset seeds; vector seeds; Protein sequences; local alignment; multiple seeds; protein databases; seed alphabet; seeds; selectivity.; sensitivity; similarity search; subset seeds; Algorithms; Amino Acid Sequence; Amino Acids; Cluster Analysis; Computational Biology; Models, Biological; Proteins; ROC Curve; Sequence Alignment; Terminology as Topic;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2009.4
  • Filename
    4752807