DocumentCode
167353
Title
HiPGA: A High Performance Genome Assembler for Short Read Sequence Data
Author
Xiaohui Duan ; Kun Zhao ; Weiguo Liu
Author_Institution
Res. Center of Digital Media Technol., Shandong Univ., Jinan, China
fYear
2014
fDate
19-23 May 2014
Firstpage
576
Lastpage
584
Abstract
Emerging next-generation sequencing technologies have opened up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, the generated reads are significantly shorter compared to the traditional Sanger shotgun sequencing method. This poses challenges for de novo assembly algorithms in terms of both accuracy and efficiency. And due to the continuing explosive growth of short read databases, there is a high demand to accelerate the often repeated long-runtime assembly task. In this paper, we present a scalable parallel algorithm - HiPGA to accelerate the de Bruijn graph-based genome assembly for high-throughput short read data. In order to make full use of the compute power of both shared-memory multi-core CPUs and distributed-memory systems, we have used a parallelized file I/O scheme as well as a hybrid parallelism for the whole assembly pipeline. Evaluations using three real paired-end datasets and the Yoruba individual dataset show that compared to two other well parallelized assemblers: ABySS and PASHA, HiPGA achieves speedups up to 7 while delivering comparable accuracy on 64 CPU cores of a compute cluster.
Keywords
distributed memory systems; graph theory; parallel algorithms; shared memory systems; HiPGA; de Bruijn graph; de novo assembly algorithm; distributed-memory system; genome sequencing; high performance genome assembler; next-generation sequencing technology; parallelized file I/O scheme; scalable parallel algorithm; shared-memory multicore CPU; short read sequence data; Assembly; Bioinformatics; Couplings; Genomics; Parallel processing; Pipelines; Vectors; Genome Assembly; MPI; Multi-threading; Short Read Data; de Bruijn Graph;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location
Phoenix, AZ
Print_ISBN
978-1-4799-4117-9
Type
conf
DOI
10.1109/IPDPSW.2014.68
Filename
6969437
Link To Document