DocumentCode :
1785089
Title :
Prediction of human disease-specific phosphorylation sites with combined feature selection approach and support vector machine
Author :
Xiaoyi Xu ; Ao Li ; Minghui Wang
Author_Institution :
Sch. of Inf. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei, China
fYear :
2014
fDate :
2-5 Nov. 2014
Firstpage :
23
Lastpage :
30
Abstract :
Phosphorylation is a crucial post translational modification, which regulates almost all cellular process in life. It has long been recognized that protein phosphorylation has close relationship with diseases, and therefore many researches are undertaken to predict phosphorylation sites for disease treatment and drug design. However, despite the success achieved by these approaches, no method focuses on disease-associated phosphorylation sites prediction. Herein, for the first time we propose a novel approach that is specially designed to identify disease-specific phosphorylation sites based on SVM. Human disease-associated phosphorylation data is extracted from PhosphoSitePlus database and local sequences are derived for training. To take full advantage of sequence information, a combined feature selection method-based SVM (CFS-SVM) that incorporates mRMR filtering process and forward feature selection process is developed. With CFS-SVM, we successfully predict disease-specific phosphorylation sites. Performance evaluation shows that CFS-SVM is significantly better than the widely used classifiers, including Bayesian decision theory and k nearest neighbour. With the extremely high specificity of 99%, CFS-SVM can still achieve a high sensitivity. Besides, the analysis of corresponding kinases and selected features also shed light on understanding of the potential mechanism of disease-phosphorylation relationships and guide further experimental validations.
Keywords :
association; biochemistry; bioinformatics; cellular biophysics; classification; data analysis; diseases; enzymes; feature extraction; feature selection; filters; learning (artificial intelligence); medical computing; molecular biophysics; molecular configurations; reaction kinetics theory; sequences; support vector machines; Bayesian decision theory; CFS-SVM method; PhosphoSitePlus database; cellular process regulation; classifier; data extraction; disease treatment; disease-phosphorylation relationship mechanism; disease-specific phosphorylation site identification; drug design; feature analysis; feature selection-based SVM; forward feature selection process; human disease-associated phosphorylation data; human disease-specific phosphorylation site prediction; k-nearest neighbour; kinase analysis; local sequence derivation; mRMR filtering; performance evaluation; post translational modification; protein phosphorylation; sensitivity; sequence information; specificity; support vector machine; training; Alzheimer´s disease; Amino acids; Cancer; Feature extraction; Proteins; Support vector machines; disease-specific; feature selection; phosphorylation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location :
Belfast
Type :
conf
DOI :
10.1109/BIBM.2014.6999299
Filename :
6999299
Link To Document :
بازگشت