Title :
Classification of HIV-1 protease crystal structures using Random Forest, linear discriminant analysis and logistic regression
Author :
Ko, Gene M. ; Reddy, A. Srinivas ; Kumar, Sunil ; Bailey, Barbara A. ; Garg, Rajni
Author_Institution :
Comput. Sci. Res. Center, San Diego State Univ., San Diego, CA, USA
Abstract :
The present study develops a classification model to correlate the binding pockets of 70 HIV-1 protease crystal structures in terms of their structural descriptors to their complexed HIV-1 protease inhibitors. The Random Forest classification model is used to reduce the chemical descriptor space from 456 to the 12 most relevant descriptors based on the Gini importance measure. The selected 12 descriptors are then used to develop classification models using linear discriminant analysis (LDA) and logistic regression (LR). The top eight descriptors were found to produce the best LDA model with an overall error of 30% and a leave-one-out cross validation error of 44.29%, while the top five descriptors were found to produce the best LR model with an overall error of 28.57% and a leave-one-out cross validation error of 41.43%. Hierarchical clustering was performed on the top five and eight descriptors to verify whether the descriptor selection of Random Forest can group together the binding pockets based on their complexed ligands. The selected descriptors would play a crucial role in understanding the HIV-1 protease binding pocket structure in terms of its chemical descriptors.
Keywords :
biology computing; macromolecules; pattern classification; pattern clustering; regression analysis; Gini importance measurement; HIV-1 protease crystal structures; hierarchical clustering; leave-one-out cross validation; linear discriminant analysis; logistic regression; random forest classification model; structural descriptors; Artificial neural networks; Chemicals; Data mining; Drugs; Inhibitors; Linear discriminant analysis; Logistics; Predictive models; Proteins; Support vector machines;
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2010 IEEE Symposium on
Conference_Location :
Montreal, QC
Print_ISBN :
978-1-4244-6766-2
DOI :
10.1109/CIBCB.2010.5510465