• DocumentCode
    659545
  • Title

    Optimizing a MapReduce module of preprocessing high-throughput DNA sequencing data

  • Author

    Wei-Chun Chung ; Yu-Jung Chang ; Chien-Chih Chen ; Der-Tsai Lee ; Jan-Ming Ho

  • Author_Institution
    Res. Center for Inf. Technol. Innovation, Acad. Sinica Taipei, Taipei, Taiwan
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    The MapReduce framework has become the de facto choice for big data analysis in a variety of applications. In MapReduce programming model, computation is distributed to a cluster of computing nodes that runs in parallel. The performance of a MapReduce application is thus affected by system and middleware, characteristics of data, and design and implementation of the algorithms. In this study, we focus on performance optimization of a MapReduce application, i.e., CloudRS, which tackles on the problem of detecting and removing errors in the next-generation sequencing de novo genomic data. We present three strategies, i.e., content-exchange, content-grouping, and index-only strategies, of communication between the Map() and Reduce() functions. The three strategies differ in the way messages are exchanged between the two functions. We also present experimental results to compare performance of the three strategies.
  • Keywords
    biology computing; data analysis; middleware; molecular biophysics; parallel programming; CloudRS; MapReduce framework; MapReduce module; MapReduce programming model; big data analysis; content-exchange strategy; content-grouping strategy; data preprocessing; high-throughput DNA sequencing data; index-only strategy; middleware; next-generation sequencing data; performance optimization; Bioinformatics; Data handling; Data storage systems; Genomics; Information management; Optimization; Sequential analysis; error correction; genome assembly; mapreduce; next-generation sequencing; optimization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691694
  • Filename
    6691694