• DocumentCode
    3714564
  • Title

    A multi-stage protein secondary structure prediction system using machine learning and information theory

  • Author

    Masood Zamani;Stefan C. Kremer

  • Author_Institution
    School of Computer Science, University of Guelph, Canada
  • fYear
    2015
  • Firstpage
    1304
  • Lastpage
    1309
  • Abstract
    In this paper, we evaluated the performance of a multi-stage protein secondary structure (PSS) prediction model. The proposed classifier uses statistical information and protein profiles. The statistical information is derived from protein sequences and structures by using a k-means clustering technique and Information theory. In the first stage, a feed-forward artificial neural network maps a sequence fragment to a region in the Ramachandran plot (2D-plot). A score vector is constructed with the mapped region using clustering and statistical information. The score vector represents the tendency of pairing an identified region in the 2D-plot and secondary structures for a residue. The score vectors which are used in the second stage have fewer dimensions compared to input vectors that are commonly derived from protein sequences or profile information. In the second stage, a two-tier classifier is employed based on an artificial neural network and a genetic programming (GP) method. The GP method uses IF rules for a three-state classification. The two-tier classifier´s performance is compared to those of two-tier artificial neural networks (ANNs) and support vector machines (SVMs). The prediction method is examined with a common protein dataset, RS126. The performance of the proposed classification model is measured based on Q3 and segment overlap (SOV) scores. The proposed PSS prediction model improves over 3% the Q3 score and 2% the SOV score in comparison to those of two-tier ANN and SVMs architectures.
  • Keywords
    "Artificial neural networks","Information theory","Proteins","Support vector machines"
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/BIBM.2015.7359867
  • Filename
    7359867