Abstract :
A comparative workflow, including linear and non-linear QSAR models, was carried out to evaluate the predictive accuracy of models and predict the inhibition activity of a series of aryl-substituted isobenzofuran-1(3H)-ones. The data set consisted of 34 compounds was classified into the training and test sets, randomly. Molecular descriptors were selected using the genetic algorithm (GA) as a feature selection tool. Various linear models based on multiple linear regression (MLR), principle component regression (PCR) and partial least square (PLS) and non-linear models based on artificial neural network (ANN), adaptive network-based fuzzy inference system (ANFIS) and support vector machine (SVM) methods were developed and compared. The accuracy of the models was studied by leave-one-out cross-validation (𝑸𝐋𝐎𝐎𝟐), Y-randomization test and group of compounds as external test set. Six descriptors were selected by GA to develop predictive models. With respect to the linear models, GA-PCR method was more accurate than the reset with statistical results of 𝑹²𝐭𝐫𝐚𝐢𝐧=𝟎.𝟖𝟖𝟑, 𝑹²𝐭𝐞𝐬𝐭=𝟎.𝟖𝟗𝟕, 𝑹²𝐚𝐝𝐣,𝐭𝐫𝐚𝐢𝐧=𝟎.𝟖𝟐𝟗, 𝑹²𝐚𝐝𝐣,𝐭𝐞𝐬𝐭=𝟎.𝟖𝟒𝟗, 𝑭𝐭𝐫𝐚𝐢𝐧=𝟐𝟒.𝟎𝟕 and 𝑭𝐭𝐞𝐬𝐭=𝟑𝟒.𝟏𝟕. In case of non-linear models, GA-SVM (𝑹²𝐭𝐫𝐚𝐢𝐧=𝟎.𝟗𝟗𝟐 and 𝑹²𝐭𝐞𝐬𝐭=𝟎.𝟗𝟗𝟕) showed high predictive accuracy for the inhibitory activity. It was found that the selected descriptors have the major roles in interpretation of biological activities of the compounds.
Keywords :
QSAR , genetic algorithms , global optimization , SVM