Title :
SeqHive: A Reconfigurable Computer Cluster for Genome Re-sequencing
Author :
Stevens, Kristian ; Chen, Henry ; Filiba, Terry ; McMahon, Peter ; Song, Yun S.
Author_Institution :
Dept. of Comput. Sci., Univ. of California Davis, Davis, CA, USA
fDate :
Aug. 31 2010-Sept. 2 2010
Abstract :
We demonstrate how Field Programmable Gate Arrays (FPGAs) may be used to address the computing challenges associated with assembling genome sequences from recent ultra-high-throughput sequencing technologies. Advances in sequencing technology allow researchers to generate immense amounts of raw data in the form of short reads with high error rates. A prerequisite to effectively utilizing this data for most applications is accurate alignment to a reference genome. While dynamic programming (DP) alignment algorithms are generally avoided on conventional architectures due to their computational complexity, they can be tailored for efficient implementation on systolic architectures. We describe and implement the first system capable of assembling large genomes using DP. We implemented application-specific DP algorithms for aligning data from ultra-high-throughput sequencers in a reconfigurable computing cluster. To obtain the necessary throughput while maintaining scoring integrity, we extended the compact encoding scheme of Lipton and Lopresti for our application. Each FPGA is capable of rapidly aligning multiple reads in parallel against a long reference genome. The reconfigurable cluster proves to be scalable and capable of processing real world datasets with a sustained performance of 11 tera cell updates per second. We examine the advantages and practicality of our system by benchmarking real genomic data from a large sequencing project. Our exhaustive validation confirms that application specific computing hardware can provide more accurate results than current heuristic methods and remain practical. While directly addressing the important problem of genomic assembly, particularly in circumstances where error rates or evolutionary divergence is high, the methods presented are also relevant to many other current applications for this type of data.
Keywords :
biology computing; computational complexity; dynamic programming; field programmable gate arrays; genomics; FPGA; SeqHive; computational complexity; dynamic programming; field programmable gate arrays; genome re-sequencing; reconfigurable computer cluster; FPGAs; genome sequencing; sequence alignment;
Conference_Titel :
Field Programmable Logic and Applications (FPL), 2010 International Conference on
Conference_Location :
Milano
Print_ISBN :
978-1-4244-7842-2
DOI :
10.1109/FPL.2010.121