• DocumentCode
    185337
  • Title

    Delivering bioinformatics MapReduce applications in the cloud

  • Author

    Forer, Lukas ; Lipic, Tomislav ; Schonherr, Sven ; Weisensteiner, Hansi ; Davidovic, Davor ; Kronenberg, Florian ; Afgan, Enis

  • Author_Institution
    Div. of Genetic Epidemiology, Med. Univ. of Innsbruck, Innsbruck, Austria
  • fYear
    2014
  • fDate
    26-30 May 2014
  • Firstpage
    373
  • Lastpage
    377
  • Abstract
    The ever-increasing data production and availability in the field of bioinformatics demands a paradigm shift towards the utilization of novel solutions for efficient data storage and processing, such as the MapReduce data parallel programming model and the corresponding Apache Hadoop framework. Despite the evident potential of this model and existence of already available algorithms and applications, especially for batch processing of large data sets as in the Next Generation Sequencing analysis, bioinformatics MapReduce applications are yet to become widely adopted in the bioinformatics data analysis. We identify two prerequisites for their adaptation and utilization: (1) the ability to compose complex workflows from multiple bioinformatics MapReduce tools that will abstract technical details of how those tools are combined and executed allowing bioinformatics domain experts to focus on the analysis, and (2) the availability of accessible and flexible computing infrastructure for this type of data processing. This paper presents integration of two existing systems: Cloudgene, a bioinformatics MapReduce workflow framework, and CloudMan, a cloud manager for delivering application execution environments. Together, they enable delivery of bioinformatics MapReduce applications in the Cloud.
  • Keywords
    batch processing (computers); bioinformatics; cloud computing; data analysis; parallel programming; Apache Hadoop framework; CloudMan; Cloudgene; MapReduce data parallel programming model; application execution environments; batch processing; bioinformatics MapReduce applications; bioinformatics MapReduce workflow framework; bioinformatics data analysis; cloud manager; data processing; data production; data storage; flexible computing infrastructure; large data sets; multiple bioinformatics MapReduce tools; next generation sequencing analysis; Bioinformatics; Biological system modeling; Cloud computing; Computational modeling; Data analysis; Genomics; Sequential analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2014 37th International Convention on
  • Conference_Location
    Opatija
  • Print_ISBN
    978-953-233-081-6
  • Type

    conf

  • DOI
    10.1109/MIPRO.2014.6859593
  • Filename
    6859593