A scalable reference-point based algorithm to efficiently search large chemical databases

Author

Napolitano, Francesco ; Tagliaferri, Roberto ; Baldi, Pierre

Author_Institution

Inst. for Genomics & Bioinf., Univ. of California-Irvine, Irvine, CA, USA

fYear

2010

fDate

18-23 July 2010

Firstpage

1

Lastpage

6

Abstract

Hight-Throughput Screening (HTS) is a powerful tool in drug discovery, but very expensive in terms of required equipment and running costs. The virtual equivalent of HTS is molecular databases with the ability to search between millions of molecules by means of a similarity measure. In this work we propose a new class of bounds, algorithms and storage strategies based on the Intersection Inequality [5] for the Tanimoto Similarity to improve state of the art performances in querying large repositories of binary fingerprints. We focus on a special case that we call the β = B algorithm. The performance of the algorithm is assessed by simulating queries over an excerpt of the ChemDB [7]. We show how the average search can be up to 37% faster than using the Bit-Bound[4] alone, depending on the amount of space dedicated to data structures needed by the algorithm.

Keywords

chemistry computing; query processing; ChemDB; Tanimoto similarity; binary fingerprints; data structures; drug discovery; high-throughput screening; intersection inequality; large chemical database searching; large repository querying; molecular databases; scalable reference-point based algorithm; storage strategies; Fingerprint recognition; Mathematical model;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks (IJCNN), The 2010 International Joint Conference on

Conference_Location

Barcelona

ISSN

1098-7576

Print_ISBN

978-1-4244-6916-1

Type

conf

DOI

10.1109/IJCNN.2010.5596608

Filename

5596608