DocumentCode :
2491680
Title :
A scalable reference-point based algorithm to efficiently search large chemical databases
Author :
Napolitano, Francesco ; Tagliaferri, Roberto ; Baldi, Pierre
Author_Institution :
Inst. for Genomics & Bioinf., Univ. of California-Irvine, Irvine, CA, USA
fYear :
2010
fDate :
18-23 July 2010
Firstpage :
1
Lastpage :
6
Abstract :
Hight-Throughput Screening (HTS) is a powerful tool in drug discovery, but very expensive in terms of required equipment and running costs. The virtual equivalent of HTS is molecular databases with the ability to search between millions of molecules by means of a similarity measure. In this work we propose a new class of bounds, algorithms and storage strategies based on the Intersection Inequality [5] for the Tanimoto Similarity to improve state of the art performances in querying large repositories of binary fingerprints. We focus on a special case that we call the β = B algorithm. The performance of the algorithm is assessed by simulating queries over an excerpt of the ChemDB [7]. We show how the average search can be up to 37% faster than using the Bit-Bound[4] alone, depending on the amount of space dedicated to data structures needed by the algorithm.
Keywords :
chemistry computing; query processing; ChemDB; Tanimoto similarity; binary fingerprints; data structures; drug discovery; high-throughput screening; intersection inequality; large chemical database searching; large repository querying; molecular databases; scalable reference-point based algorithm; storage strategies; Fingerprint recognition; Mathematical model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), The 2010 International Joint Conference on
Conference_Location :
Barcelona
ISSN :
1098-7576
Print_ISBN :
978-1-4244-6916-1
Type :
conf
DOI :
10.1109/IJCNN.2010.5596608
Filename :
5596608
Link To Document :
بازگشت