DocumentCode
2491680
Title
A scalable reference-point based algorithm to efficiently search large chemical databases
Author
Napolitano, Francesco ; Tagliaferri, Roberto ; Baldi, Pierre
Author_Institution
Inst. for Genomics & Bioinf., Univ. of California-Irvine, Irvine, CA, USA
fYear
2010
fDate
18-23 July 2010
Firstpage
1
Lastpage
6
Abstract
Hight-Throughput Screening (HTS) is a powerful tool in drug discovery, but very expensive in terms of required equipment and running costs. The virtual equivalent of HTS is molecular databases with the ability to search between millions of molecules by means of a similarity measure. In this work we propose a new class of bounds, algorithms and storage strategies based on the Intersection Inequality [5] for the Tanimoto Similarity to improve state of the art performances in querying large repositories of binary fingerprints. We focus on a special case that we call the β = B algorithm. The performance of the algorithm is assessed by simulating queries over an excerpt of the ChemDB [7]. We show how the average search can be up to 37% faster than using the Bit-Bound[4] alone, depending on the amount of space dedicated to data structures needed by the algorithm.
Keywords
chemistry computing; query processing; ChemDB; Tanimoto similarity; binary fingerprints; data structures; drug discovery; high-throughput screening; intersection inequality; large chemical database searching; large repository querying; molecular databases; scalable reference-point based algorithm; storage strategies; Fingerprint recognition; Mathematical model;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks (IJCNN), The 2010 International Joint Conference on
Conference_Location
Barcelona
ISSN
1098-7576
Print_ISBN
978-1-4244-6916-1
Type
conf
DOI
10.1109/IJCNN.2010.5596608
Filename
5596608
Link To Document