Title :
Predicting non-classical secretory proteins by using Gene Ontology terms and physicochemical properties
Author :
Huang, Wen-Lin ; Liaw, Chyn ; Ho, Shinn-Ying
Author_Institution :
Dept. of Manage. Inf. Syst., Asia Pacific Inst. of Creativity, Miaoli, Taiwan
Abstract :
Eukaryotic secretory proteins that traverse classical ER-Golgi pathway are usually characterized by short N-terminal signal peptides. However, several secretory proteins lacking the signal peptides are found to be exported by a non-classical secretion pathway. Therefore, predicting non-classical secretory proteins regardless of the N-terminal signal peptides is necessary for developing a critical computational approach. Several prediction methods have been proposed by using various types of features to predict secretory proteins. However, prediction performance seems not acceptable. This study proposes an SVM-based prediction method, namely ProSec-iGOX, which uses a major set of informative Gene Ontology (GO) terms and a minor set of assistance features. Physicochemical properties as the assistance features are useful when a query protein sequence without homologous protein with annotated GO terms. Two data sets, S25 and S40, having the identity 25% and 40%, respectively, are adopted for performance comparisons. The ProSec-iGOX yields test accuracies of 95.1% and 96.8% when adopting on the data sets S25 and S40 respectively. The latter accuracy (96.8%) is significantly higher than that of SPRED (82.2%), which uses frequency of tri-peptides and short peptides, secondary structure, physicochemical properties as input features to a random forest classifier. The experimental results show that GO terms are effective features for predicting non-classical secretory proteins.
Keywords :
bioinformatics; proteins; proteomics; support vector machines; Gene Ontology term; N-terminal signal peptide; ProSec-iGOX; SVM-based prediction method; eukaryotic secretory protein; nonclassical secretory protein; physicochemical property; prediction performance; traverse classical ER-Golgi pathway; Accuracy; Databases; Feature extraction; Ontologies; Peptides; Proteins; Training; Gene Ontology; non-classical secretion; physicochemical properties; secretory; signal peptides;
Conference_Titel :
Computer Science and Automation Engineering (CSAE), 2011 IEEE International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-8727-1
DOI :
10.1109/CSAE.2011.5952841