Author/Authors :
Ming Hao، نويسنده , , Yan Li، نويسنده , , Yonghua Wang، نويسنده , , Shuwei Zhang، نويسنده ,
Abstract :
Presently, a genetic algorithm (GA)-support vector machine (SVM) coupled approach is proposed for optimizing the 2D molecular descriptor subset generated for series of P2Y12 (members of the G-protein-coupled receptor family) antagonists, with the statistical performance and efficiency of the model being simultaneously enhanced by SVM kernel-based nonlinear projection. As we know, this is the first QSAR study for prediction of P2Y12 inhibition activity based on an unusually large dataset of 364 P2Y12 antagonists with diversity of structures. In addition, three other widely used approaches, i.e., partial least squares (PLS), random forest (RF), and Gaussian process (GP) routines combined with GA (namely, GA–PLS, GA–RF, GA–GP, respectively) are also employed and compared with the GA–SVM method in terms of several rigorous evaluation criteria. The obtained results indicate that the GA–SVM model is a powerful tool for prediction of P2Y12 antagonists, producing a conventional correlation coefficient R2 of 0.976 and image (cross-validation) of 0.829 for the training set as well as image of 0.811 for the test set, which significantly outperforms the other three methods with the average R2 = 0.894, image, image. The proposed model with excellent prediction capacity from both the internal to external quality should be helpful for screening and optimization of potential P2Y12 antagonists prior to chemical synthesis in drug development.
Keywords :
P2Y12 , Descriptor selection , Support vector machine , Genetic Algorithm