• DocumentCode
    723697
  • Title

    merAligner: A Fully Parallel Sequence Aligner

  • Author

    Georganas, Evangelos ; Buluc, Aydin ; Chapman, Jarrod ; Oliker, Leonid ; Rokhsar, Daniel ; Yelick, Katherine

  • Author_Institution
    Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA
  • fYear
    2015
  • fDate
    25-29 May 2015
  • Firstpage
    561
  • Lastpage
    570
  • Abstract
    Aligning a set of query sequences to a set of target sequences is an important task in bioinformatics. In this work we present merAligner, a highly parallel sequence aligner that implements a seed -- and -- extend algorithm and employs parallelism in all of its components. MerAligner relies on a high performance distributed hash table (seed index) and uses one-sided communication capabilities of the Unified Parallel C to facilitate a fine-grained parallelism. We leverage communication optimizations at the construction of the distributed hash table and software caching schemes to reduce communication during the aligning phase. Additionally, merAligner preprocesses the target sequences to extract properties enabling exact sequence matching with minimal communication. Finally, we efficiently parallelize the I/O intensive phases and implement an effective load balancing scheme. Results show that merAligner exhibits efficient scaling up to thousands of cores on a Cray XC30 supercomputer using real human and wheat genome data while significantly outperforming existing parallel alignment tools.
  • Keywords
    C language; bioinformatics; cache storage; optimisation; parallel processing; resource allocation; Cray XC30 supercomputer; I/O intensive phases; aligning phase; bioinformatics; communication optimizations; communication reduction; fine-grained parallelism; high performance distributed hash table; load balancing scheme; merAligner; one-sided communication capabilities; parallel sequence aligner; query sequences; seed index; seed-and-extend algorithm; sequence matching; software caching schemes; unified parallel C; wheat genome data; Bioinformatics; Data structures; Genomics; Indexes; Load management; Optimization; Software;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
  • Conference_Location
    Hyderabad
  • ISSN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2015.96
  • Filename
    7161544