DocumentCode :
228698
Title :
Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly
Author :
Georganas, Evangelos ; Buluc, Aydin ; Chapman, Jarrod ; Oliker, Leonid ; Rokhsar, Daniel ; Yelick, Katherine
Author_Institution :
Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA
fYear :
2014
fDate :
16-21 Nov. 2014
Firstpage :
437
Lastpage :
448
Abstract :
De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous fragments called reads. We study optimized parallelization of the most time-consuming phases of Meraculous, a state of-the-art production assembler. First, we present a new parallel algorithm for k-mer analysis, characterized by intensive communication and I/O requirements, and reduce the memory requirements by 6.93×. Second, we efficiently parallelize de Bruijn graph construction and traversal, which necessitates a distributed hash table and is a key component of most de novo assemblers. We provide a novel algorithm that leverages one-sided communication capabilities of the Unified Parallel C (UPC) to facilitate the requisite fine-grained parallelism and avoidance of data hazards, while analytically proving its scalability properties. Overall results show unprecedented performance and efficient scaling on up to 15,360 cores of a Cray XC30, on human genome as well as the challenging wheat genome, with performance improvement from days to seconds.
Keywords :
Cray computers; biology computing; genomics; graph theory; input-output programs; parallel algorithms; program assemblers; Cray XC30; I/O requirement; Meraculous; UPC; data hazard avoidance; de novo assembler; de novo genome assembly; distributed hash table; fine-grained parallelism; genomic sequence; human genome; k-mer analysis; memory requirements; one-sided communication capability; optimized parallelization; parallel algorithm; parallel de Bruijn graph construction and traversal; production assembler; scalability property; unified parallel C; wheat genome; Algorithm design and analysis; Assembly; Bioinformatics; Genomics; Memory management; Program processors; Sequential analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for
Conference_Location :
New Orleans, LA
Print_ISBN :
978-1-4799-5499-5
Type :
conf
DOI :
10.1109/SC.2014.41
Filename :
7013023
Link To Document :
بازگشت