Title :
Characterizing and optimizing the memory footprint of de novo short read DNA sequence assembly
Author :
Cook, Jeffrey J. ; Zilles, Craig
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Illinois at Urbana-Champaign, Urbana, IL
Abstract :
In this work, we analyze the memory-intensive bioinformatics problem of ldquode novordquo DNA sequence assembly, which is the process of assembling short DNA sequences obtained by experiment into larger contiguous sequences. In particular, we analyze the performance scaling challenges inherent to de Bruijn graph-based assembly, which is particularly well suited for the data produced by ldquonext generationrdquo sequencing machines. Unlike many bioinformatics codes which are computation-intensive or control-intensive, we find the memory footprint to be the primary performance issue for de novo sequence assembly. Specifically, we make four main contributions: 1) we demonstrate analytically that performing error correction before sequence assembly enables larger genomes to be assembled in a given amount of memory, 2) we identify that the use of this technique provides the key performance advantage to the leading assembly code, Velvet, 3) we demonstrate how this pre-assembly error correction technique can be subdivided into multiple passes to enable de Bruijn graph-based assembly to scale to even larger genomes, and 4) we demonstrate how Velvet´s in-core performance can be improved using memory-centric optimizations.
Keywords :
bioinformatics; error correction codes; genetics; graph theory; optimisation; parallel algorithms; DNA sequence assembly code; computation-intensive; control-intensive; de Bruijn graph-based assembly; genome; memory-centric optimization; memory-intensive bioinformatics code; parallel assembly algorithm; preassembly error correction; sequencing machine; Assembly; Bioinformatics; Computer science; Costs; DNA; Error analysis; Error correction codes; Genomics; Performance analysis; Sequences;
Conference_Titel :
Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4244-4184-6
DOI :
10.1109/ISPASS.2009.4919646