• DocumentCode
    167263
  • Title

    EBGW_OMP: A sequence-based method for accurate prediction of outer membrane proteins

  • Author

    Lingyun Zou ; Qingshan Ni ; Fuquan Hu

  • Author_Institution
    Dept. of Microbiol., Third Mil. Med. Univ., Chongqing, China
  • fYear
    2014
  • fDate
    21-24 May 2014
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Outer membrane proteins (OMPs) play important roles in bacterial cellular processes. Discriminating OMPs from different fold types of proteins is helpful for successful prediction of their structures and for exact designs of OMP-targeted drugs. In this paper, we developed a novel prediction method based on primary sequence features and support vector machine (SVM) algorithms. For protein sequences, discriminative features were extracted by the combination of sequence encoding based on grouped weights (EBGW), amino acid compositions and biochemical properties. Feature subsets were screened using F-score algorithm for training a SVM-based classifier, namely EBGW_OMP. The performance of EBGW_OMP was examined on a benchmark dataset of 1087 proteins. The results show that EBGW_OMP can discriminate OMPs from globular proteins, α-helical membrane proteins or non-OMPs with cross-validated accuracy of 98.0%, 97.6% or 97.9%, respectively, which outperformed existing sequence-based methods. EBGW_OMP also successfully distinguished 681 out of 722 OMPs with 97.0% accuracy in another benchmark dataset of 2657 proteins. Genome-wide tests show that EBGW_OMP has excellent capability of correctly detecting OMPs and is considerable for genomic OMPs prediction. The web server implements EBGW_OMP is freely accessible at http://bioinfo.tmmu.edu.cn/EBGW_ OMP.
  • Keywords
    biochemistry; bioinformatics; biomembranes; classification; drugs; encoding; feature extraction; genomics; macromolecules; medical computing; molecular configurations; molecular weight; proteins; proteomics; sequences; statistical analysis; support vector machines; α-helical membrane proteins; EBGW_OMP training; F-score algorithm; OMP detection; OMP discrimination; OMP-targeted drug designs; SVM algorithms; SVM-based classifier training; amino acid compositions; bacterial cellular processes; biochemical properties; cross-validation accuracy; discriminative feature extraction; feature subset screening; genome-wide tests; genomic OMP prediction; globular proteins; nonOMPs; outer membrane protein prediction; primary sequence features; protein benchmark dataset; protein fold types; protein sequences; protein structure prediction; sequence EBGW; sequence encoding based on grouped weights; sequence-based method; support vector machine; Accuracy; Amino acids; Biomembranes; Genomics; Proteins; Support vector machines; Vectors; EBGW; feature selection; machine learning; support vector machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence in Bioinformatics and Computational Biology, 2014 IEEE Conference on
  • Conference_Location
    Honolulu, HI
  • Type

    conf

  • DOI
    10.1109/CIBCB.2014.6845502
  • Filename
    6845502