Title :
Improving CUDASW++, a Parallelization of Smith-Waterman for CUDA Enabled Devices
Author :
Hains, Doug ; Cashero, Zach ; Ottenberg, Mark ; Bohm, Wim ; Rajopadhye, Sanjay
Author_Institution :
Dept. of Comput. Sci., Colorado State Univ., Fort Collins, CO, USA
Abstract :
CUDASW++ is a parallelization of the Smith-Waterman algorithm for CUDA graphical processing units that computes the similarity scores of a query sequence paired with each sequence in a database. The algorithm uses one of two kernel functions to compute the score between a given pair of sequences: the inter-task kernel or the intra-task kernel. We have identified the intra-task kernel as a major bottleneck in the CUDASW++ algorithm. We have developed a new intra-task kernel that is faster than the original intra-task kernel used in CUDASW++. We describe the development of our kernel as a series of incremental changes that provide insight into a number of issues that must be considered when developing any algorithm for the CUDA architecture. We analyze the performance of our kernel compared to the original and show that the use of our intra-task kernel substantially improves the overall performance of CUDASW++ on the order of three to four giga-cell updates per second on various benchmark databases.
Keywords :
biology computing; computer graphic equipment; coprocessors; parallel algorithms; parallel architectures; query processing; CUDA architecture; CUDA enabled devices; CUDA graphical processing unit; CUDASW++; Smith-Waterman algorithm parallelization; Smith-Waterman parallelization; intertask kernel; intratask kernel; kernel function; query sequence; sequence alignment; similarity score; Computer architecture; Databases; Graphics processing unit; Heuristic algorithms; Instruction sets; Kernel; Tiles;
Conference_Titel :
Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
Conference_Location :
Shanghai
Print_ISBN :
978-1-61284-425-1
Electronic_ISBN :
1530-2075
DOI :
10.1109/IPDPS.2011.182