DocumentCode :
677982
Title :
An Empirical Investigation of Virtual Screening
Author :
Rafati-Afshar, Amir Ali ; Bouchachia, Abdelhamid
Author_Institution :
Smart Technol. Res. Centre, Bournemouth Univ., Bournemouth, UK
fYear :
2013
fDate :
13-16 Oct. 2013
Firstpage :
2641
Lastpage :
2646
Abstract :
Drug discovery relies much on data processing. Virtual screening (VS) is a typical method of drug discovery that exploits chemical structures (molecules) to identify those that are likely to bind to a particular drug target. VS can be turned into either a matching or a classification problem where the quality of the data matters very much. The number of features (and their properties) and data imbalance are general problems of chemical datasets used in VS. This paper investigates how to deal with these two problems to enhance the accuracy of VS and specifically to reduce the false positive rate. On one hand, we use the synthetic minority over sampling technique (SMOTE) as a technique to balance data and on the other hand we investigate different molecular descriptors and fingerprints to serve as features. A classification approach is used to assess the performance of four chosen classifiers first individually and then by combining them. As an alternative an instance-based approach is employed to observe the effect on accuracy. Results from the classification method show that a higher accuracy and a lower false positive rate can be achieved by initially balancing the datasets using SMOTE and then classifying them. The effects of descriptors and fingerprints on accuracy and false positive rates can only be discussed for each dataset separately. Combining distance matrices of different structural fingerprints does not cause active and similar compounds to appear at the top of the dissimilarity ranking.
Keywords :
drugs; medical computing; molecular biophysics; pattern classification; SMOTE; VS; chemical datasets; chemical structures; classification approach; classifiers; data imbalance; data processing; datasets balancing; dissimilarity ranking; distance matrices; drug discovery; drug target; instance-based approach; molecular descriptors; structural fingerprints; synthetic minority over sampling technique; virtual screening; Accuracy; Compounds; Drugs; Fingerprint recognition; Optimized production technology; Radio frequency; Training; Virtual screening; accuracy; classification; data imbalance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on
Conference_Location :
Manchester
Type :
conf
DOI :
10.1109/SMC.2013.451
Filename :
6722204
Link To Document :
بازگشت