• DocumentCode
    397266
  • Title

    A new similarity measure among protein sequences

  • Author

    Wu, Kuen-Pin ; Lin, Hsin-Nan ; Sung, Ting-Yi ; Hsu, Wen-Lian

  • Author_Institution
    Inst. of Inf. Sci., Acad. Sinica, Taipei, Taiwan
  • fYear
    2003
  • fDate
    11-14 Aug. 2003
  • Firstpage
    347
  • Lastpage
    352
  • Abstract
    Protein sequence analysis is an important tool to decode the logic of life. One of the most important similarity measures in this area is the edit distance between amino acids of two sequences. We believe this criterion should be reconsidered because protein features are probably associated more with small peptide fragments than with individual amino acids. In this paper, we design small patterns that are associated with highly conversed regions among a set of protein sequences. These patterns are used analogous to the index terms in information retrieval. Therefore, we do not consider gaps within patterns. This new similarity measure has been applied to phylogenetic tree construction, protein clustering and protein secondary structure prediction and has produced promising results.
  • Keywords
    biology computing; genetics; information retrieval; molecular biophysics; object-oriented methods; pattern clustering; proteins; sequential decoding; vocabulary; amino acids; edit distance; index terms; information retrieval; pattern design; peptide fragments; phylogenetic tree construction; protein clustering; protein secondary structure prediction; protein sequence analysis; Amino acids; Clustering algorithms; Decoding; Information analysis; Information retrieval; Information science; Logic; Peptides; Phylogeny; Protein sequence;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE
  • Print_ISBN
    0-7695-2000-6
  • Type

    conf

  • DOI
    10.1109/CSB.2003.1227335
  • Filename
    1227335