DocumentCode :
69944
Title :
An Improved Protein Structural Classes Prediction Method by Incorporating Both Sequence and Structure Information
Author :
Leyi Wei ; Minghong Liao ; Xing Gao ; Quan Zou
Author_Institution :
Sch. of Software, Xiamen Univ., Xiamen, China
Volume :
14
Issue :
4
fYear :
2015
fDate :
Jun-15
Firstpage :
339
Lastpage :
349
Abstract :
Protein structural classes information is beneficial for secondary and tertiary structure prediction, protein folds prediction, and protein function analysis. Thus, predicting protein structural classes is of vital importance. In recent years, several computational methods have been developed for low-sequence-similarity (25%-40%) protein structural classes prediction. However, the reported prediction accuracies are actually not satisfactory. Aiming to further improve the prediction accuracies, we propose three different feature extraction methods and construct a comprehensive feature set that captures both sequence and structure information. By applying a random forest (RF) classifier to the feature set, we further develop a novel method for structural classes prediction. We test the proposed method on three benchmark datasets (25PDB, 640, and 1189) with low sequence similarity, and obtain the overall prediction accuracies of 93.5%, 92.6%, and 93.4%, respectively. Compared with six competing methods, the accuracies we achieved are 3.4%, 6.2%, and 8.7% higher than those achieved by the best-performing methods, showing the superiority of our method. Moreover, due to the limitation of the size of the three benchmark datasets, we further test the proposed method on three updated large-scale datasets with different sequence similarities (40%, 30%, and 25%). The proposed method achieves above 90% accuracies for all the three datasets, consistent with the accuracies on the above three benchmark datasets. Experimental results suggest our method as an effective and promising tool for structural classes prediction. Currently, a webserver that implements the proposed method is available on http://121.192.180.204:8080/RF_PSCP/Index.html.
Keywords :
biological techniques; biology computing; feature extraction; molecular biophysics; molecular configurations; proteins; random processes; RF; benchmark datasets; comprehensive feature set; computational method; feature extraction method; improved protein structural classes prediction method; large-scale datasets; low-sequence-similarity protein structural classes prediction; protein fold prediction; protein function analysis; protein structural classes information; random forest classifier; secondary structure prediction; sequence information; structure information; tertiary structure prediction; Accuracy; Amino acids; Benchmark testing; Feature extraction; Proteins; Radio frequency; Vectors; Feature extraction; protein structural classes; random forest;
fLanguage :
English
Journal_Title :
NanoBioscience, IEEE Transactions on
Publisher :
ieee
ISSN :
1536-1241
Type :
jour
DOI :
10.1109/TNB.2014.2352454
Filename :
6898821
Link To Document :
بازگشت