DocumentCode :
1809785
Title :
Workshop: Comparative assembly of metagenomic sequences
Author :
Liu, Bo ; Pop, Mihai
Author_Institution :
Dept. of Comput. Sci., Univ. of Maryland, College Park, MD, USA
fYear :
2012
fDate :
23-25 Feb. 2012
Firstpage :
1
Lastpage :
1
Abstract :
Next-generation sequencing technologies permit metagenomic studies to characterize the entire bacterial community within an environment by producing a large amount of short noisy DNA reads. One of the most challenging computational tasks is to assemble millions of short reads into longer contigs, which are used as the basis of subsequent computational analyses. Several de novo assembly methods geared towards single genome have been tuned and applied to metagenomic data set, but very little progress has been made to the comparative assembly for metagenomics. In addition, more and more bacterial genome sequences become available and provide a great opportunity to conduct reference-assisted assembly. In this project, we introduce a computational tool for comparative assembly of metagenomic sequences. Our software first selects reference genomes based on taxonomic profiles estimated from MetaPhyler, and then metagenomic reads are quickly mapped to the reference genomes. When building contigs, we employ a greedy solution of the minimum setcovering problem to produce longer contigs. Furthermore, we propose a hybrid assembly approach, which shows significantly better results than either comparative or de novo assembly does individually. We analyzed two mock and 728 real metagenomic samples from the Human Microbiome Project, and achieved comparable results with the state-of-the-art de novo assemblers. Through our proposed hybrid approach, we assembled 79% of the reads into contigs longer than or equal to 300bp long contigs.
Keywords :
DNA; cellular biophysics; genomics; microorganisms; molecular biophysics; MetaPhyler; bacterial community; bacterial genome sequences; comparative assembly; de novo assembly methods; human microbiome project; hybrid assembly approach; metagenomic data set; metagenomic sequences; next-generation sequencing technologies; reference-assisted assembly; short noisy DNA reads; state-of-the-art de novo assembly; subsequent computational analysis; Assembly; Bioinformatics; Buildings; DNA; Educational institutions; Genomics; Microorganisms; Comparative Genomics; Genome Assembly; Metagenomics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2012 IEEE 2nd International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4673-1320-9
Electronic_ISBN :
978-1-4673-1319-3
Type :
conf
DOI :
10.1109/ICCABS.2012.6182671
Filename :
6182671
Link To Document :
بازگشت