• DocumentCode
    680782
  • Title

    Using Iterative MapReduce for Parallel Virtual Screening

  • Author

    Ahmed, Laeeq ; Edlund, Ake ; Laure, Erwin ; Spjuth, Ola

  • Author_Institution
    Dept. of HPCViz, R. Inst. of Technol., Stockholm, Sweden
  • Volume
    2
  • fYear
    2013
  • fDate
    2-5 Dec. 2013
  • Firstpage
    27
  • Lastpage
    32
  • Abstract
    Virtual Screening is a technique in chemo informatics used for Drug discovery by searching large libraries of molecule structures. Virtual Screening often uses SVM, a supervised machine learning technique used for regression and classification analysis. Virtual screening using SVM not only involves huge datasets, but it is also compute expensive with a complexity that can grow at least up to O(n2). SVM based applications most commonly use MPI, which becomes complex and impractical with large datasets. As an alternative to MPI, MapReduce, and its different implementations, have been successfully used on commodity clusters for analysis of data for problems with very large datasets. Due to the large libraries of molecule structures in virtual screening, it becomes a good candidate for MapReduce. In this paper we present a MapReduce implementation of SVM based virtual screening, using Spark, an iterative MapReduce programming model. We show that our implementation has a good scaling behaviour and opens up the possibility of using huge public cloud infrastructures efficiently for virtual screening.
  • Keywords
    Big Data; chemistry computing; computational complexity; data analysis; iterative methods; learning (artificial intelligence); message passing; support vector machines; virtual reality; MPI; SVM; Spark; chemo informatics; classification analysis; commodity clusters; data analysis; drug discovery; iterative MapReduce programming model; molecule structures; parallel virtual screening; public cloud infrastructure; regression analysis; scaling behaviour; supervised machine learning technique; Cloud computing; Fault tolerance; Fault tolerant systems; Predictive models; Sparks; Support vector machines; Training; Big Data; Chemoinformatics; MapReduce; Parallel SVM; Spark;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing Technology and Science (CloudCom), 2013 IEEE 5th International Conference on
  • Conference_Location
    Bristol
  • Type

    conf

  • DOI
    10.1109/CloudCom.2013.99
  • Filename
    6735391