• DocumentCode
    1785089
  • Title

    Prediction of human disease-specific phosphorylation sites with combined feature selection approach and support vector machine

  • Author

    Xiaoyi Xu ; Ao Li ; Minghui Wang

  • Author_Institution
    Sch. of Inf. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei, China
  • fYear
    2014
  • fDate
    2-5 Nov. 2014
  • Firstpage
    23
  • Lastpage
    30
  • Abstract
    Phosphorylation is a crucial post translational modification, which regulates almost all cellular process in life. It has long been recognized that protein phosphorylation has close relationship with diseases, and therefore many researches are undertaken to predict phosphorylation sites for disease treatment and drug design. However, despite the success achieved by these approaches, no method focuses on disease-associated phosphorylation sites prediction. Herein, for the first time we propose a novel approach that is specially designed to identify disease-specific phosphorylation sites based on SVM. Human disease-associated phosphorylation data is extracted from PhosphoSitePlus database and local sequences are derived for training. To take full advantage of sequence information, a combined feature selection method-based SVM (CFS-SVM) that incorporates mRMR filtering process and forward feature selection process is developed. With CFS-SVM, we successfully predict disease-specific phosphorylation sites. Performance evaluation shows that CFS-SVM is significantly better than the widely used classifiers, including Bayesian decision theory and k nearest neighbour. With the extremely high specificity of 99%, CFS-SVM can still achieve a high sensitivity. Besides, the analysis of corresponding kinases and selected features also shed light on understanding of the potential mechanism of disease-phosphorylation relationships and guide further experimental validations.
  • Keywords
    association; biochemistry; bioinformatics; cellular biophysics; classification; data analysis; diseases; enzymes; feature extraction; feature selection; filters; learning (artificial intelligence); medical computing; molecular biophysics; molecular configurations; reaction kinetics theory; sequences; support vector machines; Bayesian decision theory; CFS-SVM method; PhosphoSitePlus database; cellular process regulation; classifier; data extraction; disease treatment; disease-phosphorylation relationship mechanism; disease-specific phosphorylation site identification; drug design; feature analysis; feature selection-based SVM; forward feature selection process; human disease-associated phosphorylation data; human disease-specific phosphorylation site prediction; k-nearest neighbour; kinase analysis; local sequence derivation; mRMR filtering; performance evaluation; post translational modification; protein phosphorylation; sensitivity; sequence information; specificity; support vector machine; training; Alzheimer´s disease; Amino acids; Cancer; Feature extraction; Proteins; Support vector machines; disease-specific; feature selection; phosphorylation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
  • Conference_Location
    Belfast
  • Type

    conf

  • DOI
    10.1109/BIBM.2014.6999299
  • Filename
    6999299