• DocumentCode
    20566
  • Title

    Retrieving Smith-Waterman Alignments with Optimizations for Megabase Biological Sequences Using GPU

  • Author

    de O.Sandes, Edans Flavius ; de Melo, Alba Cristina M.A.

  • Author_Institution
    University of Brasilia (UnB), Brazil
  • Volume
    24
  • Issue
    5
  • fYear
    2013
  • fDate
    May-13
  • Firstpage
    1009
  • Lastpage
    1021
  • Abstract
    In Genome Projects, biological sequences are aligned thousands of times, in a daily basis. The Smith-Waterman algorithm is able to retrieve the optimal local alignment with quadratic time and space complexity. So far, aligning huge sequences, such as whole chromosomes, with the Smith-Waterman algorithm has been regarded as unfeasible, due to huge computing and memory requirements. However, high-performance computing platforms such as GPUs are making it possible to obtain the optimal result for huge sequences in reasonable time. In this paper, we propose and evaluate CUDAlign 2.1, a parallel algorithm that uses GPU to align huge sequences, executing the Smith-Waterman algorithm combined with Myers-Miller, with linear space complexity. In order to achieve that, we propose optimizations which are able to reduce significantly the amount of data processed, while enforcing full parallelism most of the time. Using the NVIDIA GTX 560 Ti board and comparing real DNA sequences that range from 162 KBP (Thousand Base Pairs) to 59 MBP (Million Base Pairs), we show that CUDAlign 2.1 is scalable. Also, we show that CUDAlign 2.1 is able to produce the optimal alignment between the chimpanzee chromosome 22 (33 MBP) and the human chromosome 21 (47 MBP) in 8.4 hours and the optimal alignment between the chimpanzee chromosome Y (24 MBP) and the human chromosome Y (59 MBP) in 13.1 hours.
  • Keywords
    Bioinformatics; Biological cells; DNA; Graphics processing unit; Instruction sets; Optimization; Bioinformatics; GPU; parallel algorithms; sequence alignment;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2012.194
  • Filename
    6226380