DocumentCode :
40410
Title :
A Segmentation-Based Method to Extract Structural and Evolutionary Features for Protein Fold Recognition
Author :
Dehzangi, Abdollah ; Paliwal, Kuldip ; Lyons, James ; Sharma, Ashok ; Sattar, Abdul
Author_Institution :
Inst. for Integrated & Intell. Syst., Griffith Univ., Brisbane, QLD, Australia
Volume :
11
Issue :
3
fYear :
2014
fDate :
May-June 2014
Firstpage :
510
Lastpage :
519
Abstract :
Protein fold recognition (PFR) is considered as an important step towards the protein structure prediction problem. Despite all the efforts that have been made so far, finding an accurate and fast computational approach to solve the PFR still remains a challenging problem for bioinformatics and computational biology. In this study, we propose the concept of segmented-based feature extraction technique to provide local evolutionary information embedded in position specific scoring matrix (PSSM) and structural information embedded in the predicted secondary structure of proteins using SPINE-X. We also employ the concept of occurrence feature to extract global discriminatory information from PSSM and SPINE-X. By applying a support vector machine (SVM) to our extracted features, we enhance the protein fold prediction accuracy for 7.4 percent over the best results reported in the literature. We also report 73.8 percent prediction accuracy for a data set consisting of proteins with less than 25 percent sequence similarity rates and 80.7 percent prediction accuracy for a data set with proteins belonging to 110 folds with less than 40 percent sequence similarity rates. We also investigate the relation between the number of folds and the number of features being used and show that the number of features should be increased to get better protein fold prediction results when the number of folds is relatively large.
Keywords :
bioinformatics; feature extraction; molecular configurations; proteomics; support vector machines; PFR fast computational approach; PSSM; SPINE-X; SVM; bioinformatics; computational biology; global discriminatory information; local evolutionary information; position specific scoring matrix; protein evolutionary feature extraction; protein fold prediction accuracy enhancement; protein fold recognition; protein predicted secondary structure; protein structural feature extraction; protein structure prediction; segmentation based method; segmented based feature extraction; structural information; support vector machine; Accuracy; Amino acids; Data mining; Feature extraction; Protein sequence; Support vector machines; Protein fold recognition; evolutionary-based features; feature extraction; occurrence; segmented auto covariance; segmented distribution; structural-based features; support vector machine (SVM);
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2013.2296317
Filename :
6693731
Link To Document :
بازگشت