Title :
Parallel Construction of Bidirected String Graphs for Genome Assembly
Author :
Jackson, Benjamin G. ; Aluru, Srinivas
Author_Institution :
Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA
Abstract :
Graph theoretic models for genome assembly are continually being proposed and refined. At the same time, large scale assembly projects rely on the overlap-layout-consensus assembly paradigm, in which the best pairwise alignments serve as seeds for a greedy extension of contigs. These methods, which largely rely on local information, are used despite research that demonstrates the superiority of other graph models, largely because the memory requirement of such models is prohibitive on single processor architectures. In this paper, we present a parallel algorithm for constructing bidirected string graphs from whole genome shotgun sequencing data, for use in the assembly problem. Our algorithm uses O(n/p) local computation - where n is the total size of shotgun sequences and p is the number of processors - and a constant number of all-to-all communications. We demonstrate scalability of the algorithm on the Blue Gene/L, and show that graphs for large, complex genome sequencing projects with deep sequence coverage can be effectively handled using parallel computers.
Keywords :
biology computing; directed graphs; genetics; parallel algorithms; parallel machines; Blue Gene/L; O(n/p) local computation; bidirected string graphs; genome assembly; genome shotgun sequencing data; graph theoretic models; parallel algorithm; parallel computers; single processor architectures; Assembly; Bioinformatics; Computer architecture; DNA; Genomics; Large-scale systems; Parallel algorithms; Parallel processing; Scalability; Sequences; bidirected; blue gene; genome assembly; graph;
Conference_Titel :
Parallel Processing, 2008. ICPP '08. 37th International Conference on
Conference_Location :
Portland, OR
Print_ISBN :
978-0-7695-3374-2
Electronic_ISBN :
0190-3918
DOI :
10.1109/ICPP.2008.70