Title :
Feature selection and granular SVM classification for protein arginine methylation identification
Author :
Ding, Zejin ; Zhang, Yan-Qing ; Zheng, Yujun George
Author_Institution :
Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA, USA
Abstract :
Protein methylation modification has been discovered for half a century but still far less been studied than other modifications. Computational analysis is recently introduced to discover other unknown methylation sites based on few known ones. To effectively predict possible methylation, sophisticated classification strategy should be well devised. In this paper, we first extracted informative features from methylated fragments in many protein sequences, including the physicochemical properties, secondary structure information, evolutionary profiles, and solvent accessibility of surrounding residues. Then, an efficient feature selection method (mRMR) is applied to eliminate redundant features but keep important ones. Since methylated residues are far less than non-methylated, the collected data is relatively imbalanced. Thus, we propose to use the granular support vector machine (GSVM) which is specially designed for imbalanced classification problems. A 7-fold cross validation shows that our strategy generates comparable predication accuracy with many current methods or even better. Meanwhile, our method provides insights to identify the underlying mechanisms of protein methylation.
Keywords :
biology computing; feature extraction; molecular biophysics; pattern classification; proteins; support vector machines; 7-fold cross validation; evolutionary profile; feature selection; granular SVM classification; granular support vector machine; informative feature extraction; methylated fragment; methylated residue; physicochemical property; protein arginine methylation identification; protein methylation modification; protein sequence; solvent accessibility; Amino acids; Classification tree analysis; Data mining; Feature extraction; Proteins; Sequences; Solvents; Support vector machine classification; Support vector machines; USA Councils; Feature Selction 1; Granular Support Vector Machines (GSVM); Imbalanced Data Mining; Methylation Prediction; Protein Methylation;
Conference_Titel :
Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on
Conference_Location :
San Antonio, TX
Print_ISBN :
978-1-4244-2793-2
Electronic_ISBN :
1062-922X
DOI :
10.1109/ICSMC.2009.5345973