Title :
Rapid and Robust Denoising of Pyrosequenced Amplicons for Metagenomics
Author :
Byunghan Lee ; Joonhong Park ; Sungroh Yoon
Author_Institution :
Electr. Eng. & Comput. Sci., Seoul Nat. Univ., Seoul, South Korea
Abstract :
Metagenomic sequencing has become a crucial tool for obtaining a gene catalogue of operational taxonomic units (OTUs) in a microbial community. High-throughput pyrosequencing is a next-generation sequencing technique very popular in microbial community analysis due to its longer read length compared to alternative methods. Computational tools are inevitable to process raw data from pyrosequencers, and in particular, noise removal is a critical data-mining step to obtain robust sequence reads. However, the slow rate of existing denoisers has bottlenecked the whole pyrosequencing process, let alone hindering efforts to improve robustness. To address these, we propose a new approach that can accelerate the denoising process substantially. By using our approach, it now takes only about 2 hours to denoise 62,873 pyrosequenced amplicons from a mixture of 91 full-length 16S rRNA clones. It would otherwise take nearly 2.5 days if existing software tools were used. Furthermore, our approach can effectively reduce overestimating the number of OTUs, producing 6.7 times fewer species-level OTUs on average than a state-of-the-art alternative under the same condition. Leveraged by our approach, we hope that metagenomic sequencing will become an even more appealing tool for microbial community analysis.
Keywords :
biology computing; data mining; genomics; graphics processing units; molecular biophysics; 16S rRNA clone; GPU; data mining; data processing; gene catalogue; graphics processing unit; high-throughput pyrosequencing technique; metagenomic sequencing; microbial community; noise removal; operational taxonomic unit; pyrosequenced amplicons denoising; sequence read; software tool; time 2 hour; Acceleration; Communities; Graphics processing units; Instruction sets; Noise; Noise reduction; Robustness; GPU; amplicons; biomedical informatics; cluster analysis; metagenomics; pyrosequencing;
Conference_Titel :
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
Print_ISBN :
978-1-4673-4649-8
DOI :
10.1109/ICDM.2012.68