DocumentCode
723697
Title
merAligner: A Fully Parallel Sequence Aligner
Author
Georganas, Evangelos ; Buluc, Aydin ; Chapman, Jarrod ; Oliker, Leonid ; Rokhsar, Daniel ; Yelick, Katherine
Author_Institution
Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA
fYear
2015
fDate
25-29 May 2015
Firstpage
561
Lastpage
570
Abstract
Aligning a set of query sequences to a set of target sequences is an important task in bioinformatics. In this work we present merAligner, a highly parallel sequence aligner that implements a seed -- and -- extend algorithm and employs parallelism in all of its components. MerAligner relies on a high performance distributed hash table (seed index) and uses one-sided communication capabilities of the Unified Parallel C to facilitate a fine-grained parallelism. We leverage communication optimizations at the construction of the distributed hash table and software caching schemes to reduce communication during the aligning phase. Additionally, merAligner preprocesses the target sequences to extract properties enabling exact sequence matching with minimal communication. Finally, we efficiently parallelize the I/O intensive phases and implement an effective load balancing scheme. Results show that merAligner exhibits efficient scaling up to thousands of cores on a Cray XC30 supercomputer using real human and wheat genome data while significantly outperforming existing parallel alignment tools.
Keywords
C language; bioinformatics; cache storage; optimisation; parallel processing; resource allocation; Cray XC30 supercomputer; I/O intensive phases; aligning phase; bioinformatics; communication optimizations; communication reduction; fine-grained parallelism; high performance distributed hash table; load balancing scheme; merAligner; one-sided communication capabilities; parallel sequence aligner; query sequences; seed index; seed-and-extend algorithm; sequence matching; software caching schemes; unified parallel C; wheat genome data; Bioinformatics; Data structures; Genomics; Indexes; Load management; Optimization; Software;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
Conference_Location
Hyderabad
ISSN
1530-2075
Type
conf
DOI
10.1109/IPDPS.2015.96
Filename
7161544
Link To Document