DocumentCode
1785089
Title
Prediction of human disease-specific phosphorylation sites with combined feature selection approach and support vector machine
Author
Xiaoyi Xu ; Ao Li ; Minghui Wang
Author_Institution
Sch. of Inf. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei, China
fYear
2014
fDate
2-5 Nov. 2014
Firstpage
23
Lastpage
30
Abstract
Phosphorylation is a crucial post translational modification, which regulates almost all cellular process in life. It has long been recognized that protein phosphorylation has close relationship with diseases, and therefore many researches are undertaken to predict phosphorylation sites for disease treatment and drug design. However, despite the success achieved by these approaches, no method focuses on disease-associated phosphorylation sites prediction. Herein, for the first time we propose a novel approach that is specially designed to identify disease-specific phosphorylation sites based on SVM. Human disease-associated phosphorylation data is extracted from PhosphoSitePlus database and local sequences are derived for training. To take full advantage of sequence information, a combined feature selection method-based SVM (CFS-SVM) that incorporates mRMR filtering process and forward feature selection process is developed. With CFS-SVM, we successfully predict disease-specific phosphorylation sites. Performance evaluation shows that CFS-SVM is significantly better than the widely used classifiers, including Bayesian decision theory and k nearest neighbour. With the extremely high specificity of 99%, CFS-SVM can still achieve a high sensitivity. Besides, the analysis of corresponding kinases and selected features also shed light on understanding of the potential mechanism of disease-phosphorylation relationships and guide further experimental validations.
Keywords
association; biochemistry; bioinformatics; cellular biophysics; classification; data analysis; diseases; enzymes; feature extraction; feature selection; filters; learning (artificial intelligence); medical computing; molecular biophysics; molecular configurations; reaction kinetics theory; sequences; support vector machines; Bayesian decision theory; CFS-SVM method; PhosphoSitePlus database; cellular process regulation; classifier; data extraction; disease treatment; disease-phosphorylation relationship mechanism; disease-specific phosphorylation site identification; drug design; feature analysis; feature selection-based SVM; forward feature selection process; human disease-associated phosphorylation data; human disease-specific phosphorylation site prediction; k-nearest neighbour; kinase analysis; local sequence derivation; mRMR filtering; performance evaluation; post translational modification; protein phosphorylation; sensitivity; sequence information; specificity; support vector machine; training; Alzheimer´s disease; Amino acids; Cancer; Feature extraction; Proteins; Support vector machines; disease-specific; feature selection; phosphorylation;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location
Belfast
Type
conf
DOI
10.1109/BIBM.2014.6999299
Filename
6999299
Link To Document