• DocumentCode
    677982
  • Title

    An Empirical Investigation of Virtual Screening

  • Author

    Rafati-Afshar, Amir Ali ; Bouchachia, Abdelhamid

  • Author_Institution
    Smart Technol. Res. Centre, Bournemouth Univ., Bournemouth, UK
  • fYear
    2013
  • fDate
    13-16 Oct. 2013
  • Firstpage
    2641
  • Lastpage
    2646
  • Abstract
    Drug discovery relies much on data processing. Virtual screening (VS) is a typical method of drug discovery that exploits chemical structures (molecules) to identify those that are likely to bind to a particular drug target. VS can be turned into either a matching or a classification problem where the quality of the data matters very much. The number of features (and their properties) and data imbalance are general problems of chemical datasets used in VS. This paper investigates how to deal with these two problems to enhance the accuracy of VS and specifically to reduce the false positive rate. On one hand, we use the synthetic minority over sampling technique (SMOTE) as a technique to balance data and on the other hand we investigate different molecular descriptors and fingerprints to serve as features. A classification approach is used to assess the performance of four chosen classifiers first individually and then by combining them. As an alternative an instance-based approach is employed to observe the effect on accuracy. Results from the classification method show that a higher accuracy and a lower false positive rate can be achieved by initially balancing the datasets using SMOTE and then classifying them. The effects of descriptors and fingerprints on accuracy and false positive rates can only be discussed for each dataset separately. Combining distance matrices of different structural fingerprints does not cause active and similar compounds to appear at the top of the dissimilarity ranking.
  • Keywords
    drugs; medical computing; molecular biophysics; pattern classification; SMOTE; VS; chemical datasets; chemical structures; classification approach; classifiers; data imbalance; data processing; datasets balancing; dissimilarity ranking; distance matrices; drug discovery; drug target; instance-based approach; molecular descriptors; structural fingerprints; synthetic minority over sampling technique; virtual screening; Accuracy; Compounds; Drugs; Fingerprint recognition; Optimized production technology; Radio frequency; Training; Virtual screening; accuracy; classification; data imbalance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on
  • Conference_Location
    Manchester
  • Type

    conf

  • DOI
    10.1109/SMC.2013.451
  • Filename
    6722204