• DocumentCode
    40410
  • Title

    A Segmentation-Based Method to Extract Structural and Evolutionary Features for Protein Fold Recognition

  • Author

    Dehzangi, Abdollah ; Paliwal, Kuldip ; Lyons, James ; Sharma, Ashok ; Sattar, Abdul

  • Author_Institution
    Inst. for Integrated & Intell. Syst., Griffith Univ., Brisbane, QLD, Australia
  • Volume
    11
  • Issue
    3
  • fYear
    2014
  • fDate
    May-June 2014
  • Firstpage
    510
  • Lastpage
    519
  • Abstract
    Protein fold recognition (PFR) is considered as an important step towards the protein structure prediction problem. Despite all the efforts that have been made so far, finding an accurate and fast computational approach to solve the PFR still remains a challenging problem for bioinformatics and computational biology. In this study, we propose the concept of segmented-based feature extraction technique to provide local evolutionary information embedded in position specific scoring matrix (PSSM) and structural information embedded in the predicted secondary structure of proteins using SPINE-X. We also employ the concept of occurrence feature to extract global discriminatory information from PSSM and SPINE-X. By applying a support vector machine (SVM) to our extracted features, we enhance the protein fold prediction accuracy for 7.4 percent over the best results reported in the literature. We also report 73.8 percent prediction accuracy for a data set consisting of proteins with less than 25 percent sequence similarity rates and 80.7 percent prediction accuracy for a data set with proteins belonging to 110 folds with less than 40 percent sequence similarity rates. We also investigate the relation between the number of folds and the number of features being used and show that the number of features should be increased to get better protein fold prediction results when the number of folds is relatively large.
  • Keywords
    bioinformatics; feature extraction; molecular configurations; proteomics; support vector machines; PFR fast computational approach; PSSM; SPINE-X; SVM; bioinformatics; computational biology; global discriminatory information; local evolutionary information; position specific scoring matrix; protein evolutionary feature extraction; protein fold prediction accuracy enhancement; protein fold recognition; protein predicted secondary structure; protein structural feature extraction; protein structure prediction; segmentation based method; segmented based feature extraction; structural information; support vector machine; Accuracy; Amino acids; Data mining; Feature extraction; Protein sequence; Support vector machines; Protein fold recognition; evolutionary-based features; feature extraction; occurrence; segmented auto covariance; segmented distribution; structural-based features; support vector machine (SVM);
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2013.2296317
  • Filename
    6693731