• DocumentCode
    167353
  • Title

    HiPGA: A High Performance Genome Assembler for Short Read Sequence Data

  • Author

    Xiaohui Duan ; Kun Zhao ; Weiguo Liu

  • Author_Institution
    Res. Center of Digital Media Technol., Shandong Univ., Jinan, China
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    576
  • Lastpage
    584
  • Abstract
    Emerging next-generation sequencing technologies have opened up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, the generated reads are significantly shorter compared to the traditional Sanger shotgun sequencing method. This poses challenges for de novo assembly algorithms in terms of both accuracy and efficiency. And due to the continuing explosive growth of short read databases, there is a high demand to accelerate the often repeated long-runtime assembly task. In this paper, we present a scalable parallel algorithm - HiPGA to accelerate the de Bruijn graph-based genome assembly for high-throughput short read data. In order to make full use of the compute power of both shared-memory multi-core CPUs and distributed-memory systems, we have used a parallelized file I/O scheme as well as a hybrid parallelism for the whole assembly pipeline. Evaluations using three real paired-end datasets and the Yoruba individual dataset show that compared to two other well parallelized assemblers: ABySS and PASHA, HiPGA achieves speedups up to 7 while delivering comparable accuracy on 64 CPU cores of a compute cluster.
  • Keywords
    distributed memory systems; graph theory; parallel algorithms; shared memory systems; HiPGA; de Bruijn graph; de novo assembly algorithm; distributed-memory system; genome sequencing; high performance genome assembler; next-generation sequencing technology; parallelized file I/O scheme; scalable parallel algorithm; shared-memory multicore CPU; short read sequence data; Assembly; Bioinformatics; Couplings; Genomics; Parallel processing; Pipelines; Vectors; Genome Assembly; MPI; Multi-threading; Short Read Data; de Bruijn Graph;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
  • Conference_Location
    Phoenix, AZ
  • Print_ISBN
    978-1-4799-4117-9
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2014.68
  • Filename
    6969437