DocumentCode
66883
Title
Heterogeneous Cloud Framework for Big Data Genome Sequencing
Author
Chao Wang ; Xi Li ; Peng Chen ; Aili Wang ; Xuehai Zhou ; Hong Yu
Author_Institution
Dept. of Comput. Sci., Univ. of Sci. & Technol. of China, Hefei, China
Volume
12
Issue
1
fYear
2015
fDate
Jan.-Feb. 1 2015
Firstpage
166
Lastpage
178
Abstract
The next generation genome sequencing problem with short (long) reads is an emerging field in numerous scientific and big data research domains. However, data sizes and ease of access for scientific researchers are growing and most current methodologies rely on one acceleration approach and so cannot meet the requirements imposed by explosive data scales and complexities. In this paper, we propose a novel FPGA-based acceleration solution with MapReduce framework on multiple hardware accelerators. The combination of hardware acceleration and MapReduce execution flow could greatly accelerate the task of aligning short length reads to a known reference genome. To evaluate the performance and other metrics, we conducted a theoretical speedup analysis on a MapReduce programming platform, which demonstrates that our proposed architecture have efficient potential to improve the speedup for large scale genome sequencing applications. Also, as a practical study, we have built a hardware prototype on the real Xilinx FPGA chip. Significant metrics on speedup, sensitivity, mapping quality, error rate, and hardware cost are evaluated, respectively. Experimental results demonstrate that the proposed platform could efficiently accelerate the next generation sequencing problem with satisfactory accuracy and acceptable hardware cost.
Keywords
Big Data; bioinformatics; cloud computing; field programmable gate arrays; genomics; Big Data genome sequencing; FPGA-based acceleration solution; MapReduce execution flow; MapReduce framework; MapReduce programming platform; aligning short length reads; data sizes; error rate; explosive data scales; hardware cost; heterogeneous cloud framework; known reference genome; large-scale genome sequencing applications; mapping quality; multiple hardware accelerators; next generation genome sequencing problem; next generation sequencing problem; real Xilinx FPGA chip; theoretical speedup analysis; Acceleration; Bioinformatics; Computer architecture; Field programmable gate arrays; Genomics; Hardware; Sequential analysis; FPGA; Short reads; genome sequencing; mapping; reconfigurable hardware;
fLanguage
English
Journal_Title
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher
ieee
ISSN
1545-5963
Type
jour
DOI
10.1109/TCBB.2014.2351800
Filename
6897956
Link To Document