Title :
Short-Read Mapping by a Systolic Custom FPGA Computation
Author :
Preusser, Thomas B. ; Knodel, Oliver ; Spallek, Rainer G.
Author_Institution :
Dept. of Comput. Sci., Tech. Univ. Dresden, Dresden, Germany
fDate :
April 29 2012-May 1 2012
Abstract :
The mapping of reads, i.e. short DNA base pair strings, to large genome databases has become a critical operation for genetic analysis and diagnosis. Although this mapping operation is a simple string search tolerant of some character mismatches, it is yet extremely challenging due to the tremendous size of the searched genome databases. It is the heavy use of search heuristics such as BLAST, Maq and Bowtie, which makes the economic deployment of read mappers possible. While these heuristics achieve feasible computation times, they also sacrifice the accuracy of the mapping results, which is itself a high value for reliable diagnostics. The traditional software implementations are unable to exploit the tremendous parallelism, which is available in the mapping of thousands and millions of reads. Merely a handful of concurrent control flows, and thus searches, can be performed efficiently on contemporary multicores. Even GPU assistance only enables a few dozens of parallel searches. This paper proposes a systolic custom computation on FPGA, which implements the read mapping on a massively parallel architecture. It implements a true search and guarantees to find all read mappings under a configurable threshold of base pair mismatches. The highly regular design from compact string matchers enables the implementation of thousands of parallel search engines on a single FPGA device. The presented map per platform combines highest computational performance with an excellent result accuracy. Its performance is more than twice as high as that of a recently published comparable FPGA map per. Already when implemented on a contemporary mid-size FPGA, it meets the search speed of software heuristics, which only detect little more than half of the valid read mappings. The map per easily scales to large FPGA devices, which can, thus, implement accurate high-performance volume mappers. Accurate mapping is made available in application domains that could only afford fuzzy heuristics- by now.
Keywords :
biology computing; field programmable gate arrays; genomics; systolic arrays; BLAST; FPGA computation; Maq; base pair mismatches; character mismatches; diagnosis; fuzzy heuristics; genetic analysis; reliable diagnostics; search heuristics; searched genome databases; short DNA base pair strings; short-read mapping; software heuristics; software implementations; systolic custom computation; tremendous parallelism; Bioinformatics; Clocks; Concurrent computing; Databases; Field programmable gate arrays; Genomics; Table lookup; FPGA; Sequence Alignment; short-reads mapping;
Conference_Titel :
Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium on
Conference_Location :
Toronto, ON
Print_ISBN :
978-1-4673-1605-7
DOI :
10.1109/FCCM.2012.37