• DocumentCode
    1988458
  • Title

    A New Alignment-Independent Algorithm for Clustering Protein Sequences

  • Author

    Kelil, Abdellali ; Wang, Shengrui ; Brzezinski, Ryszard

  • Author_Institution
    Sherbrooke Univ., Sherbrooke
  • fYear
    2007
  • fDate
    14-17 Oct. 2007
  • Firstpage
    27
  • Lastpage
    34
  • Abstract
    The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important, the challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as multi-domain, circular-permutation and tandem-repeats protein sequences, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-independent algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families.
  • Keywords
    biology computing; molecular biophysics; proteins; alignment-independent algorithm; amino acid subsequences; clustering protein sequences; substitution matching similarity; Algorithm design and analysis; Amino acids; Biological system modeling; Biology computing; Clustering algorithms; Databases; Evolution (biology); Phylogeny; Protein engineering; Protein sequence; Biological function; Clustering; Non-alignable; Phylogeny; Protein sequences; component;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
  • Conference_Location
    Boston, MA
  • Print_ISBN
    978-1-4244-1509-0
  • Type

    conf

  • DOI
    10.1109/BIBE.2007.4375541
  • Filename
    4375541