Author_Institution :
Sch. of Software, Xiamen Univ., Xiamen, China
Abstract :
Information of protein 3-dimensional (3D) structures plays an essential role in molecular biology, cell biology, biomedicine, and drug design. Protein fold prediction is considered as an immediate step for deciphering the protein 3D structures. Therefore, protein fold prediction is one of fundamental problems in structural bioinformatics. Recently, numerous taxonomic methods have been developed for protein fold prediction. Unfortunately, the overall prediction accuracies achieved by existing taxonomic methods are not satisfactory although much progress has been made. To address this problem, we propose a novel taxonomic method, called PFPA, which is featured by combining a novel feature set through an ensemble classifier. Particularly, the sequential evolution information from the profiles of PSI-BLAST and the local and global secondary structure information from the profiles of PSI-PRED are combined to construct a comprehensive feature set. Experimental results demonstrate that PFPA outperforms the state-of-the-art predictors. To be specific, when tested on the independent testing set of a benchmark dataset, PFPA achieves an overall accuracy of 73.6%, which is the leading accuracy ever reported. Moreover, PFPA performs well without significant performance degradation on three updated large-scale datasets, indicating the robustness and generalization of PFPA. Currently, a webserver that implements PFPA is freely available on http://121.192.180.204:8080/PFPA/Index.html.
Keywords :
benchmark testing; bioinformatics; feature extraction; molecular biophysics; molecular configurations; pattern classification; proteins; PFPA; PSI-BLAST; benchmark dataset; biomedicine; cell biology; drug design; enhanced protein fold prediction method; ensemble classifier; feature extraction technique; feature set; global secondary structure information; independent testing set; local secondary structure information; molecular biology; protein 3-dimensional structures; sequential evolution information; state-of-the-art predictors; structural bioinformatics; taxonomic methods; updated large-scale datasets; Accuracy; Amino acids; Feature extraction; Protein engineering; Proteins; Testing; Three-dimensional displays; Ensemble classifier; feature extraction; protein fold prediction;