• DocumentCode
    893430
  • Title

    Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction

  • Author

    Lin, Ken-Li ; Lin, Chun-Yuan ; Huang, Chuen-Der ; Chang, Hsiu-Ming ; Yang, Chiao-Yun ; Lin, Chin-Teng ; Tang, Chuan Yi ; Hsu, D. Frank

  • Author_Institution
    Dept. of Electr. & Control Eng., Nat. Chiao-Tung Univ., Hsin-Chu
  • Volume
    6
  • Issue
    2
  • fYear
    2007
  • fDate
    6/1/2007 12:00:00 AM
  • Firstpage
    186
  • Lastpage
    196
  • Abstract
    The classification of protein structures is essential for their function determination in bioinformatics. At present, a reasonably high rate of prediction accuracy has been achieved in classifying proteins into four classes in the SCOP database according to their primary amino acid sequences. However, for further classification into fine-grained folding categories, especially when the number of possible folding patterns as those defined in the SCOP database is large, it is still quite a challenge. In our previous work, we have proposed a two-level classification strategy called hierarchical learning architecture (HLA) using neural networks and two indirect coding features to differentiate proteins according to their classes and folding patterns, which achieved an accuracy rate of 65.5%. In this paper, we use a combinatorial fusion technique to facilitate feature selection and combination for improving predictive accuracy in protein structure classification. When applying various criteria in combinatorial fusion to the protein fold prediction approach using neural networks with HLA and the radial basis function network (RBFN), the resulting classification has an overall prediction accuracy rate of 87% for four classes and 69.6% for 27 folding categories. These rates are significantly higher than the accuracy rate of 56.5% previously obtained by Ding and Dubchak. Our results demonstrate that data fusion is a viable method for feature selection and combination in the prediction and classification of protein structure.
  • Keywords
    biology computing; learning (artificial intelligence); molecular biophysics; prediction theory; proteins; radial basis function networks; sensor fusion; SCOP database; bioinformatics; combination criteria; combinatorial fusion technique; data fusion; feature selection; fine-grained folding; hierarchical learning architecture; indirect coding; neural networks; primary amino acid sequences; protein structure prediction; radial basis function network; two-level classification; Accuracy; Amino acids; Bioinformatics; Computer science; Control engineering; Data analysis; Neural networks; Proteins; Radial basis function networks; Spatial databases; Combinatorial fusion analysis (CFA); data fusion; diversity rank/score graph; hierarchical learning architecture (HLA); neural network (NN); protein structure prediction; radical basis function network (RBFN); rank/score functions; Algorithms; Artificial Intelligence; Computer Simulation; Databases, Protein; Information Storage and Retrieval; Models, Chemical; Models, Molecular; Pattern Recognition, Automated; Proteins; Reproducibility of Results; Sensitivity and Specificity; Sequence Analysis, Protein;
  • fLanguage
    English
  • Journal_Title
    NanoBioscience, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1536-1241
  • Type

    jour

  • DOI
    10.1109/TNB.2007.897482
  • Filename
    4220633