Title :
Manifold learning reveals nonlinear structure in metagenomic profiles
Author :
Jiang, Xingpeng ; Hu, Xiaohua ; Shen, Huiyu ; He, Tingting
Author_Institution :
Coll. of Inf. Sci. & Technol., Drexel Univ., Philadelphia, PA, USA
Abstract :
Using metagenomics to detect the global structure of microbial community remains a significant challenge. The structure of a microbial community and its functions are complicated not only because of the complex interactions among microbes but also their complicate interacting with confounding environmental factors. Recently dimension reduction methods such as Principle component analysis, Non-negative matrix factorization and Canonical correlation analysis have been employed extensively to investigate the complex structure embedded in metagenomic profiles which summarize the abundance of functional or taxonomic categorizations in metagenomic studies. However, metagenomic profiles are not necessary to meet the "Assumption of Linearity" behind these methods. Therefore it is worth to investigate how nonlinear methods can be utilized in metagenomic studies. In this paper, a nonlinear manifold learning method- Isomap is used to visualize and analyze large-scale metagenomic profiles. Isomap was applied on a large-scale Pfam profile which are derived from 45 metagenomes in Global Ocean Sampling expedition. In our result, a novel nonlinear structure of protein families is identified and the relationships among the identified nonlinear components and environmental factors of global ocean are explored. The results indicate the strength of nonlinear methods in learning the complex microbial structure. With the coming of the huge number of new sequenced metagenomes, nonlinear methods like Isomap could be necessary complementary tools to current widely used methods.
Keywords :
bioinformatics; data analysis; data visualisation; genomics; learning (artificial intelligence); microorganisms; proteins; Global Ocean Sampling expedition; Isomap; canonical correlation analysis; complex microbe interactions; dimension reduction methods; environmental factors; functional categorization; large scale Pfam profile; large scale metagenomic profile analysis; large scale metagenomic profile visualisation; metagenomic profile nonlinear structure; microbial community global structure; nonlinear manifold learning method; nonlinear methods; nonnegative matrix factorization; principal component analysis; sequenced metagenomes; taxonomic categorization; Communities; Correlation; Covariance matrix; Environmental factors; Matrix decomposition; Oceans; Principal component analysis; Isomap; Nonlinear dimension reduction; metagenomic profile; non-negative matrix factorization; principle component analysis;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
978-1-4673-2559-2
Electronic_ISBN :
978-1-4673-2558-5
DOI :
10.1109/BIBM.2012.6392684