• DocumentCode
    3417088
  • Title

    Highly efficient mapping of the Smith-Waterman algorithm on CUDA-compatible GPUs

  • Author

    Dohi, Keisuke ; Benkrid, Khaled ; Ling, Cheng ; Hamada, Tsuyoshi ; Shibata, Yuichiro

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Nagasaki Univ., Nagasaki, Japan
  • fYear
    2010
  • fDate
    7-9 July 2010
  • Firstpage
    29
  • Lastpage
    36
  • Abstract
    This paper describes a multi-threaded parallel design and implementation of the Smith-Waterman (SW) algorithm on graphic processing units (GPUs) with NVIDIA corporation´s Compute Unified Device Architecture (CUDA). Central to this is a divide and conquer approach which divides the computation of a whole pairwise sequence alignment matrix into multiple sub-matrices (or parallelograms) each running efficiently on the available hardware resources of the GPU in hand, with temporary intermediate data stored in global memory. Moreover, we use thread warps and padding techniques in order to decrease the cost of thread synchronization, as well as loop unrolling in order to reduce the cost of conditional branches. While intermediate data is stored in global memory for large queries, the most inner loop in our implementation will only access shared memory and registers. As a result of these optimizations, our implementation of the SW algorithm achieves a throughput ranging between 9.09 GCUPS (Giga Cell Update per Second) and 12.71 GCUPS on a single-GPU version, and a throughput between 29.46 GCUPS and 43.05 GCUPS on a quad-GPU platform. Compared with the best GPU implementation of the SW algorithm reported to date, our implementation achieves up to 46 % improvement in speed. The source code of our implementation is available in the public domain for Bioinformaticians to benefit from its performance.
  • Keywords
    computer graphics; divide and conquer methods; query processing; software architecture; CUDA-compatible GPU; GPU; NVIDIA corporation; Smith-Waterman algorithm; bioinformaticians; compute unified device architecture; divide and conquer approach; giga cell update per second; graphic processing units; multithreaded parallel design; pairwise sequence alignment matrix; Algorithm design and analysis; Bioinformatics; Computer architecture; Concurrent computing; Costs; Graphics; Graphics processing unit; Hardware; Registers; Throughput; Yarn; CVOA; GPU; Smith-Waterman algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Application-specific Systems Architectures and Processors (ASAP), 2010 21st IEEE International Conference on
  • Conference_Location
    Rennes
  • ISSN
    2160-0511
  • Print_ISBN
    978-1-4244-6966-6
  • Electronic_ISBN
    2160-0511
  • Type

    conf

  • DOI
    10.1109/ASAP.2010.5540796
  • Filename
    5540796