Title :
Infer Metagenomic Abundance and Reveal Homologous Genomes Based on the Structure of Taxonomy Tree
Author :
Yu-Qing Qiu ; Xue Tian ; Shihua Zhang
Author_Institution :
Nat. Center for Math. & Interdiscipl. Sci., Acad. of Math. & Syst. Sci., Beijing, China
Abstract :
Metagenomic research uses sequencing technologies to investigate the genetic biodiversity of microbiomes presented in various ecosystems or animal tissues. The composition of a microbial community is highly associated with the environment in which the organisms exist. As large amount of sequencing short reads of microorganism genomes obtained, accurately estimating the abundance of microorganisms within a metagenomic sample is becoming an increasing challenge in bioinformatics. In this paper, we describe a hierarchical taxonomy tree-based mixture model (HTTMM) for estimating the abundance of taxon within a microbial community by incorporating the structure of the taxonomy tree. In this model, genome-specific short reads and homologous short reads among genomes can be distinguished and represented by leaf and intermediate nodes in the taxonomy tree, respectively. We adopt an expectation-maximization algorithm to solve this model. Using simulated and real-world data, we demonstrate that the proposed method is superior to both flat mixture model and lowest common ancestry-based methods. Moreover, this model can reveal previously unaddressed homologous genomes.
Keywords :
bioinformatics; biological tissues; expectation-maximisation algorithm; genetics; genomics; microorganisms; mixture models; animal tissues; bioinformatics; ecosystems; expectation-maximization algorithm; genetic biodiversity; hierarchical taxonomy tree-based mixture model; homologous genomes; metagenomic research; microbial community; microbiomes; microorganism genomes; sequencing technologies; Bioinformatics; Databases; Genomics; Microorganisms; Taxonomy; Vegetation; Metagenomics; abundance estimation; expectation-maximization algorithm; taxonomy tree;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2015.2415814