• DocumentCode
    3199841
  • Title

    Regrouping of pattern clusters to reveal characteristics of distinct classes and related classes

  • Author

    Pei-Yuan Zhou ; Lee, En-Shiun Annie ; Wong, Andrew K. C.

  • Author_Institution
    Dept. of Comput., Hong Kong Polytech. Univ., Kowloon, China
  • fYear
    2013
  • fDate
    18-21 Dec. 2013
  • Firstpage
    55
  • Lastpage
    61
  • Abstract
    Discovering protein patterns for amino acids and their biochemical properties is important for revealing the underlying biophysical models. From this, pattern clustering was introduced in order to relate the discovered protein patterns to taxonomic classes in a localized region of a protein. This paper proposes an algorithm to synthesize and re-group pattern clusters, maximizing their separability in order to reveal class characteristics of the localized region of the protein based on our previous work. To evaluate the pattern clustering and regrouping pattern clusters results, we introduce three evaluation measures: F-measure, class entropy measure, and attribute entropy measure. To validate our proposed algorithm, experiments are run on synthetic data, protein family for amino acid attributes, and chemical property attributes. The experimental results show that: a) the result for regrouping pattern clusters is more accurate in class separation than only using pattern clustering; b) The clusters after regrouping are more distinctly separable with each other than only using pattern clustering; c) two types of pattern clusters are found, with one pertaining to distinct classes and the other associating with two or more related classes; and d) class characteristics are clearly revealed in the data subspace containing the patterns in the pattern clusters. The datasets with chemical properties show that unsupervised techniques can reveal common chemical attributes in the inherent classes as more of the common properties shared by different amino acids are taken into account.
  • Keywords
    molecular biophysics; pattern clustering; proteins; F-measure; amino acids; attribute entropy measure; biochemical property; class entropy measure; distinct classes; pattern clusters regrouping; protein patterns; related classes; separability; taxonomic classes; Amino acids; Clustering algorithms; Complexity theory; Entropy; Fungi; Pattern clustering; Proteins; local optimal; pattern cluster; protein functionality; regrouping; taxonomy;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
  • Conference_Location
    Shanghai
  • Type

    conf

  • DOI
    10.1109/BIBM.2013.6732718
  • Filename
    6732718