Highly efficient mapping of the Smith-Waterman algorithm on CUDA-compatible GPUs

Author

Dohi, Keisuke ; Benkrid, Khaled ; Ling, Cheng ; Hamada, Tsuyoshi ; Shibata, Yuichiro

Author_Institution

Dept. of Comput. & Inf. Sci., Nagasaki Univ., Nagasaki, Japan

fYear

2010

fDate

7-9 July 2010

Firstpage

29

Lastpage

36

Abstract

This paper describes a multi-threaded parallel design and implementation of the Smith-Waterman (SW) algorithm on graphic processing units (GPUs) with NVIDIA corporation´s Compute Unified Device Architecture (CUDA). Central to this is a divide and conquer approach which divides the computation of a whole pairwise sequence alignment matrix into multiple sub-matrices (or parallelograms) each running efficiently on the available hardware resources of the GPU in hand, with temporary intermediate data stored in global memory. Moreover, we use thread warps and padding techniques in order to decrease the cost of thread synchronization, as well as loop unrolling in order to reduce the cost of conditional branches. While intermediate data is stored in global memory for large queries, the most inner loop in our implementation will only access shared memory and registers. As a result of these optimizations, our implementation of the SW algorithm achieves a throughput ranging between 9.09 GCUPS (Giga Cell Update per Second) and 12.71 GCUPS on a single-GPU version, and a throughput between 29.46 GCUPS and 43.05 GCUPS on a quad-GPU platform. Compared with the best GPU implementation of the SW algorithm reported to date, our implementation achieves up to 46 % improvement in speed. The source code of our implementation is available in the public domain for Bioinformaticians to benefit from its performance.

Keywords

computer graphics; divide and conquer methods; query processing; software architecture; CUDA-compatible GPU; GPU; NVIDIA corporation; Smith-Waterman algorithm; bioinformaticians; compute unified device architecture; divide and conquer approach; giga cell update per second; graphic processing units; multithreaded parallel design; pairwise sequence alignment matrix; Algorithm design and analysis; Bioinformatics; Computer architecture; Concurrent computing; Costs; Graphics; Graphics processing unit; Hardware; Registers; Throughput; Yarn; CVOA; GPU; Smith-Waterman algorithm;

fLanguage

English

Publisher

ieee

Conference_Titel

Application-specific Systems Architectures and Processors (ASAP), 2010 21st IEEE International Conference on

Conference_Location

Rennes

ISSN

2160-0511

Print_ISBN

978-1-4244-6966-6

Electronic_ISBN

2160-0511

Type

conf

DOI

10.1109/ASAP.2010.5540796

Filename

5540796