• DocumentCode
    684744
  • Title

    A new method for clustering mult-domain protein sequences

  • Author

    Hongzhou He ; Mingtian Zhou

  • Author_Institution
    Coll. of Math. & Comput. Sci., Mianyang Normal Univ., Mianyang, China
  • fYear
    2012
  • fDate
    7-9 Dec. 2012
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    A new method for clustering multi-domain protein sequences was proposed by revising preference value of classical affinity propagation (AP) algorithm combined by Silhouette index of clustering validity. At the same time, the classical substitution match similarity (SMS) between two protein sequences was generalized to meet the demand of clustering `twilight zone´ protein sequences. Experimental results on four test datasets demonstrate that our method can acquire number of clusters more approximate to the family number of clusters classified by the phylogenetic trees, more consistence clustering structure for a given dataset of proteins, and the comparatively advantage in clustering multi-domain protein sequences.
  • Keywords
    biology computing; genomics; pattern clustering; proteins; trees (mathematics); affinity propagation algorithm; clustering validity; multidomain protein sequences clustering; phylogenetic trees; silhouette index; substitution match similarity; clustering; protein sequences; revised affinity propagation (RAP); similarity measure;
  • fLanguage
    English
  • Publisher
    iet
  • Conference_Titel
    Information Science and Control Engineering 2012 (ICISCE 2012), IET International Conference on
  • Conference_Location
    Shenzhen
  • Electronic_ISBN
    978-1-84919-641-3
  • Type

    conf

  • DOI
    10.1049/cp.2012.2330
  • Filename
    6755709