• DocumentCode
    66883
  • Title

    Heterogeneous Cloud Framework for Big Data Genome Sequencing

  • Author

    Chao Wang ; Xi Li ; Peng Chen ; Aili Wang ; Xuehai Zhou ; Hong Yu

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Sci. & Technol. of China, Hefei, China
  • Volume
    12
  • Issue
    1
  • fYear
    2015
  • fDate
    Jan.-Feb. 1 2015
  • Firstpage
    166
  • Lastpage
    178
  • Abstract
    The next generation genome sequencing problem with short (long) reads is an emerging field in numerous scientific and big data research domains. However, data sizes and ease of access for scientific researchers are growing and most current methodologies rely on one acceleration approach and so cannot meet the requirements imposed by explosive data scales and complexities. In this paper, we propose a novel FPGA-based acceleration solution with MapReduce framework on multiple hardware accelerators. The combination of hardware acceleration and MapReduce execution flow could greatly accelerate the task of aligning short length reads to a known reference genome. To evaluate the performance and other metrics, we conducted a theoretical speedup analysis on a MapReduce programming platform, which demonstrates that our proposed architecture have efficient potential to improve the speedup for large scale genome sequencing applications. Also, as a practical study, we have built a hardware prototype on the real Xilinx FPGA chip. Significant metrics on speedup, sensitivity, mapping quality, error rate, and hardware cost are evaluated, respectively. Experimental results demonstrate that the proposed platform could efficiently accelerate the next generation sequencing problem with satisfactory accuracy and acceptable hardware cost.
  • Keywords
    Big Data; bioinformatics; cloud computing; field programmable gate arrays; genomics; Big Data genome sequencing; FPGA-based acceleration solution; MapReduce execution flow; MapReduce framework; MapReduce programming platform; aligning short length reads; data sizes; error rate; explosive data scales; hardware cost; heterogeneous cloud framework; known reference genome; large-scale genome sequencing applications; mapping quality; multiple hardware accelerators; next generation genome sequencing problem; next generation sequencing problem; real Xilinx FPGA chip; theoretical speedup analysis; Acceleration; Bioinformatics; Computer architecture; Field programmable gate arrays; Genomics; Hardware; Sequential analysis; FPGA; Short reads; genome sequencing; mapping; reconfigurable hardware;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2351800
  • Filename
    6897956