DocumentCode :
167353
Title :
HiPGA: A High Performance Genome Assembler for Short Read Sequence Data
Author :
Xiaohui Duan ; Kun Zhao ; Weiguo Liu
Author_Institution :
Res. Center of Digital Media Technol., Shandong Univ., Jinan, China
fYear :
2014
fDate :
19-23 May 2014
Firstpage :
576
Lastpage :
584
Abstract :
Emerging next-generation sequencing technologies have opened up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, the generated reads are significantly shorter compared to the traditional Sanger shotgun sequencing method. This poses challenges for de novo assembly algorithms in terms of both accuracy and efficiency. And due to the continuing explosive growth of short read databases, there is a high demand to accelerate the often repeated long-runtime assembly task. In this paper, we present a scalable parallel algorithm - HiPGA to accelerate the de Bruijn graph-based genome assembly for high-throughput short read data. In order to make full use of the compute power of both shared-memory multi-core CPUs and distributed-memory systems, we have used a parallelized file I/O scheme as well as a hybrid parallelism for the whole assembly pipeline. Evaluations using three real paired-end datasets and the Yoruba individual dataset show that compared to two other well parallelized assemblers: ABySS and PASHA, HiPGA achieves speedups up to 7 while delivering comparable accuracy on 64 CPU cores of a compute cluster.
Keywords :
distributed memory systems; graph theory; parallel algorithms; shared memory systems; HiPGA; de Bruijn graph; de novo assembly algorithm; distributed-memory system; genome sequencing; high performance genome assembler; next-generation sequencing technology; parallelized file I/O scheme; scalable parallel algorithm; shared-memory multicore CPU; short read sequence data; Assembly; Bioinformatics; Couplings; Genomics; Parallel processing; Pipelines; Vectors; Genome Assembly; MPI; Multi-threading; Short Read Data; de Bruijn Graph;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4799-4117-9
Type :
conf
DOI :
10.1109/IPDPSW.2014.68
Filename :
6969437
Link To Document :
بازگشت