DocumentCode
1809785
Title
Workshop: Comparative assembly of metagenomic sequences
Author
Liu, Bo ; Pop, Mihai
Author_Institution
Dept. of Comput. Sci., Univ. of Maryland, College Park, MD, USA
fYear
2012
fDate
23-25 Feb. 2012
Firstpage
1
Lastpage
1
Abstract
Next-generation sequencing technologies permit metagenomic studies to characterize the entire bacterial community within an environment by producing a large amount of short noisy DNA reads. One of the most challenging computational tasks is to assemble millions of short reads into longer contigs, which are used as the basis of subsequent computational analyses. Several de novo assembly methods geared towards single genome have been tuned and applied to metagenomic data set, but very little progress has been made to the comparative assembly for metagenomics. In addition, more and more bacterial genome sequences become available and provide a great opportunity to conduct reference-assisted assembly. In this project, we introduce a computational tool for comparative assembly of metagenomic sequences. Our software first selects reference genomes based on taxonomic profiles estimated from MetaPhyler, and then metagenomic reads are quickly mapped to the reference genomes. When building contigs, we employ a greedy solution of the minimum setcovering problem to produce longer contigs. Furthermore, we propose a hybrid assembly approach, which shows significantly better results than either comparative or de novo assembly does individually. We analyzed two mock and 728 real metagenomic samples from the Human Microbiome Project, and achieved comparable results with the state-of-the-art de novo assemblers. Through our proposed hybrid approach, we assembled 79% of the reads into contigs longer than or equal to 300bp long contigs.
Keywords
DNA; cellular biophysics; genomics; microorganisms; molecular biophysics; MetaPhyler; bacterial community; bacterial genome sequences; comparative assembly; de novo assembly methods; human microbiome project; hybrid assembly approach; metagenomic data set; metagenomic sequences; next-generation sequencing technologies; reference-assisted assembly; short noisy DNA reads; state-of-the-art de novo assembly; subsequent computational analysis; Assembly; Bioinformatics; Buildings; DNA; Educational institutions; Genomics; Microorganisms; Comparative Genomics; Genome Assembly; Metagenomics;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Advances in Bio and Medical Sciences (ICCABS), 2012 IEEE 2nd International Conference on
Conference_Location
Las Vegas, NV
Print_ISBN
978-1-4673-1320-9
Electronic_ISBN
978-1-4673-1319-3
Type
conf
DOI
10.1109/ICCABS.2012.6182671
Filename
6182671
Link To Document