DocumentCode
3417088
Title
Highly efficient mapping of the Smith-Waterman algorithm on CUDA-compatible GPUs
Author
Dohi, Keisuke ; Benkrid, Khaled ; Ling, Cheng ; Hamada, Tsuyoshi ; Shibata, Yuichiro
Author_Institution
Dept. of Comput. & Inf. Sci., Nagasaki Univ., Nagasaki, Japan
fYear
2010
fDate
7-9 July 2010
Firstpage
29
Lastpage
36
Abstract
This paper describes a multi-threaded parallel design and implementation of the Smith-Waterman (SW) algorithm on graphic processing units (GPUs) with NVIDIA corporation´s Compute Unified Device Architecture (CUDA). Central to this is a divide and conquer approach which divides the computation of a whole pairwise sequence alignment matrix into multiple sub-matrices (or parallelograms) each running efficiently on the available hardware resources of the GPU in hand, with temporary intermediate data stored in global memory. Moreover, we use thread warps and padding techniques in order to decrease the cost of thread synchronization, as well as loop unrolling in order to reduce the cost of conditional branches. While intermediate data is stored in global memory for large queries, the most inner loop in our implementation will only access shared memory and registers. As a result of these optimizations, our implementation of the SW algorithm achieves a throughput ranging between 9.09 GCUPS (Giga Cell Update per Second) and 12.71 GCUPS on a single-GPU version, and a throughput between 29.46 GCUPS and 43.05 GCUPS on a quad-GPU platform. Compared with the best GPU implementation of the SW algorithm reported to date, our implementation achieves up to 46 % improvement in speed. The source code of our implementation is available in the public domain for Bioinformaticians to benefit from its performance.
Keywords
computer graphics; divide and conquer methods; query processing; software architecture; CUDA-compatible GPU; GPU; NVIDIA corporation; Smith-Waterman algorithm; bioinformaticians; compute unified device architecture; divide and conquer approach; giga cell update per second; graphic processing units; multithreaded parallel design; pairwise sequence alignment matrix; Algorithm design and analysis; Bioinformatics; Computer architecture; Concurrent computing; Costs; Graphics; Graphics processing unit; Hardware; Registers; Throughput; Yarn; CVOA; GPU; Smith-Waterman algorithm;
fLanguage
English
Publisher
ieee
Conference_Titel
Application-specific Systems Architectures and Processors (ASAP), 2010 21st IEEE International Conference on
Conference_Location
Rennes
ISSN
2160-0511
Print_ISBN
978-1-4244-6966-6
Electronic_ISBN
2160-0511
Type
conf
DOI
10.1109/ASAP.2010.5540796
Filename
5540796
Link To Document