Title :
Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly
Author :
Georganas, Evangelos ; Buluc, Aydin ; Chapman, Jarrod ; Oliker, Leonid ; Rokhsar, Daniel ; Yelick, Katherine
Author_Institution :
Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA
Abstract :
De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous fragments called reads. We study optimized parallelization of the most time-consuming phases of Meraculous, a state of-the-art production assembler. First, we present a new parallel algorithm for k-mer analysis, characterized by intensive communication and I/O requirements, and reduce the memory requirements by 6.93×. Second, we efficiently parallelize de Bruijn graph construction and traversal, which necessitates a distributed hash table and is a key component of most de novo assemblers. We provide a novel algorithm that leverages one-sided communication capabilities of the Unified Parallel C (UPC) to facilitate the requisite fine-grained parallelism and avoidance of data hazards, while analytically proving its scalability properties. Overall results show unprecedented performance and efficient scaling on up to 15,360 cores of a Cray XC30, on human genome as well as the challenging wheat genome, with performance improvement from days to seconds.
Keywords :
Cray computers; biology computing; genomics; graph theory; input-output programs; parallel algorithms; program assemblers; Cray XC30; I/O requirement; Meraculous; UPC; data hazard avoidance; de novo assembler; de novo genome assembly; distributed hash table; fine-grained parallelism; genomic sequence; human genome; k-mer analysis; memory requirements; one-sided communication capability; optimized parallelization; parallel algorithm; parallel de Bruijn graph construction and traversal; production assembler; scalability property; unified parallel C; wheat genome; Algorithm design and analysis; Assembly; Bioinformatics; Genomics; Memory management; Program processors; Sequential analysis;
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for
Conference_Location :
New Orleans, LA
Print_ISBN :
978-1-4799-5499-5