• DocumentCode
    46295
  • Title

    A Comparative Assessment of Predictive Accuracies of Conventional and Machine Learning Scoring Functions for Protein-Ligand Binding Affinity Prediction

  • Author

    Ashtawy, Hossam M. ; Mahapatra, Nihar R.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Michigan State Univ., East Lansing, MI, USA
  • Volume
    12
  • Issue
    2
  • fYear
    2015
  • fDate
    March-April 2015
  • Firstpage
    335
  • Lastpage
    347
  • Abstract
    Accurately predicting the binding affinities of large diverse sets of protein-ligand complexes efficiently is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identify potential drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein´s binding site has a significant bearing on the accuracy of virtual screening. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited predictive accuracy has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we explore a range of novel SFs employing different machine-learning (ML) approaches in conjunction with a variety of physicochemical and geometrical features characterizing protein-ligand complexes. We assess the scoring accuracies of these new ML SFs as well as those of conventional SFs in the context of the 2007 and 2010 PDBbind benchmark datasets on both diverse and protein-family-specific test sets. We also investigate the influence of the size of the training dataset and the type and number of features used on scoring accuracy. We find that the best performing ML SF has a Pearson correlation coefficient of 0.806 between predicted and measured binding affinities compared to 0.644 achieved by a state-of-the-art conventional SF. We also find that ML SFs benefit more than their conventional counterparts from increases in the number of features and the size of training dataset. In addition, they perform better on novel proteins that they were never trained on before.
  • Keywords
    biochemistry; bioinformatics; drugs; learning (artificial intelligence); molecular biophysics; proteins; PDBbind benchmark datasets; Pearson correlation coefficient; drug discovery; machine learning scoring functions; protein-ligand binding affinity prediction; protein-ligand complexes; virtual screening; Accuracy; Barium; Databases; Drugs; Feature extraction; Proteins; Training; Drug discovery; machine learning; protein-ligand binding affinity; scoring function; scoring power; virtual screening;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2351824
  • Filename
    6883187