Title :
A Comparative Assessment of Predictive Accuracies of Conventional and Machine Learning Scoring Functions for Protein-Ligand Binding Affinity Prediction
Author :
Ashtawy, Hossam M. ; Mahapatra, Nihar R.
Author_Institution :
Dept. of Electr. & Comput. Eng., Michigan State Univ., East Lansing, MI, USA
Abstract :
Accurately predicting the binding affinities of large diverse sets of protein-ligand complexes efficiently is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identify potential drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein´s binding site has a significant bearing on the accuracy of virtual screening. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited predictive accuracy has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we explore a range of novel SFs employing different machine-learning (ML) approaches in conjunction with a variety of physicochemical and geometrical features characterizing protein-ligand complexes. We assess the scoring accuracies of these new ML SFs as well as those of conventional SFs in the context of the 2007 and 2010 PDBbind benchmark datasets on both diverse and protein-family-specific test sets. We also investigate the influence of the size of the training dataset and the type and number of features used on scoring accuracy. We find that the best performing ML SF has a Pearson correlation coefficient of 0.806 between predicted and measured binding affinities compared to 0.644 achieved by a state-of-the-art conventional SF. We also find that ML SFs benefit more than their conventional counterparts from increases in the number of features and the size of training dataset. In addition, they perform better on novel proteins that they were never trained on before.
Keywords :
biochemistry; bioinformatics; drugs; learning (artificial intelligence); molecular biophysics; proteins; PDBbind benchmark datasets; Pearson correlation coefficient; drug discovery; machine learning scoring functions; protein-ligand binding affinity prediction; protein-ligand complexes; virtual screening; Accuracy; Barium; Databases; Drugs; Feature extraction; Proteins; Training; Drug discovery; machine learning; protein-ligand binding affinity; scoring function; scoring power; virtual screening;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2014.2351824