Title :
EBGW_OMP: A sequence-based method for accurate prediction of outer membrane proteins
Author :
Lingyun Zou ; Qingshan Ni ; Fuquan Hu
Author_Institution :
Dept. of Microbiol., Third Mil. Med. Univ., Chongqing, China
Abstract :
Outer membrane proteins (OMPs) play important roles in bacterial cellular processes. Discriminating OMPs from different fold types of proteins is helpful for successful prediction of their structures and for exact designs of OMP-targeted drugs. In this paper, we developed a novel prediction method based on primary sequence features and support vector machine (SVM) algorithms. For protein sequences, discriminative features were extracted by the combination of sequence encoding based on grouped weights (EBGW), amino acid compositions and biochemical properties. Feature subsets were screened using F-score algorithm for training a SVM-based classifier, namely EBGW_OMP. The performance of EBGW_OMP was examined on a benchmark dataset of 1087 proteins. The results show that EBGW_OMP can discriminate OMPs from globular proteins, α-helical membrane proteins or non-OMPs with cross-validated accuracy of 98.0%, 97.6% or 97.9%, respectively, which outperformed existing sequence-based methods. EBGW_OMP also successfully distinguished 681 out of 722 OMPs with 97.0% accuracy in another benchmark dataset of 2657 proteins. Genome-wide tests show that EBGW_OMP has excellent capability of correctly detecting OMPs and is considerable for genomic OMPs prediction. The web server implements EBGW_OMP is freely accessible at http://bioinfo.tmmu.edu.cn/EBGW_ OMP.
Keywords :
biochemistry; bioinformatics; biomembranes; classification; drugs; encoding; feature extraction; genomics; macromolecules; medical computing; molecular configurations; molecular weight; proteins; proteomics; sequences; statistical analysis; support vector machines; α-helical membrane proteins; EBGW_OMP training; F-score algorithm; OMP detection; OMP discrimination; OMP-targeted drug designs; SVM algorithms; SVM-based classifier training; amino acid compositions; bacterial cellular processes; biochemical properties; cross-validation accuracy; discriminative feature extraction; feature subset screening; genome-wide tests; genomic OMP prediction; globular proteins; nonOMPs; outer membrane protein prediction; primary sequence features; protein benchmark dataset; protein fold types; protein sequences; protein structure prediction; sequence EBGW; sequence encoding based on grouped weights; sequence-based method; support vector machine; Accuracy; Amino acids; Biomembranes; Genomics; Proteins; Support vector machines; Vectors; EBGW; feature selection; machine learning; support vector machine;
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology, 2014 IEEE Conference on
Conference_Location :
Honolulu, HI
DOI :
10.1109/CIBCB.2014.6845502