• DocumentCode
    2491680
  • Title

    A scalable reference-point based algorithm to efficiently search large chemical databases

  • Author

    Napolitano, Francesco ; Tagliaferri, Roberto ; Baldi, Pierre

  • Author_Institution
    Inst. for Genomics & Bioinf., Univ. of California-Irvine, Irvine, CA, USA
  • fYear
    2010
  • fDate
    18-23 July 2010
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Hight-Throughput Screening (HTS) is a powerful tool in drug discovery, but very expensive in terms of required equipment and running costs. The virtual equivalent of HTS is molecular databases with the ability to search between millions of molecules by means of a similarity measure. In this work we propose a new class of bounds, algorithms and storage strategies based on the Intersection Inequality [5] for the Tanimoto Similarity to improve state of the art performances in querying large repositories of binary fingerprints. We focus on a special case that we call the β = B algorithm. The performance of the algorithm is assessed by simulating queries over an excerpt of the ChemDB [7]. We show how the average search can be up to 37% faster than using the Bit-Bound[4] alone, depending on the amount of space dedicated to data structures needed by the algorithm.
  • Keywords
    chemistry computing; query processing; ChemDB; Tanimoto similarity; binary fingerprints; data structures; drug discovery; high-throughput screening; intersection inequality; large chemical database searching; large repository querying; molecular databases; scalable reference-point based algorithm; storage strategies; Fingerprint recognition; Mathematical model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2010 International Joint Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1098-7576
  • Print_ISBN
    978-1-4244-6916-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2010.5596608
  • Filename
    5596608