Title :
LA2SNE: A novel stochastic neighbor embedding approach for microbiome data visualization
Author :
Weiwei Xu ; Rong Xie ; Xingpeng Jiang ; Xiaohua Hu
Author_Institution :
Int. Sch. of Software, Wuhan Univ., Wuhan, China
Abstract :
Visualization of large-scale data is the first step to acquire preliminary insight into complex biological data. In recent years, many statistical visualization methods have been designed to support data visualization. Stochastic Neighbor Embedding (SNE) is one of these efficient approaches, which uses the probabilistic distance to model differences among data points within the data space. SNE and its variants (e.g. t-SNE) have demonstrated superiority over other methods in exploring complex data. By using these methods, however, similar data points tend to group together, which prevents the identification of subtle differences. A good visualization method should not only present clear data structure, but distinguish subtle differences. In this paper, we propose a novel extension of SNE. The approach has three innovations: (1) we replaced the Gaussian distribution in SNE with a Laplacian distribution on both high dimensional space and low dimensional space. The Laplace distribution has wider tails than the Gaussian distribution, and thus it can be used to overcome the over-crowding problem noted in SNE and its variants. (2) We used a symmetric modification of Kullback-Leibler divergence measure as the objective function which provides more flexibility to the model. (3) We add a graph Laplacian regularization terms to the objective function which have an advantage to preserve the manifold structure among data points. Experiments on simulation data and human microbiome data indicate that it has better visualization performance than other methods in distinguishing crowding data points.
Keywords :
Gaussian distribution; Laplace equations; bioinformatics; data structures; data visualisation; genomics; microorganisms; statistical databases; Gaussian distribution; Kullback-Leibler divergence measure; LA2SNE; Laplacian distribution; complex biological data; complex data; crowding data points; data space; data structure; graph Laplacian regularization terms; high-dimensional space; human microbiome data; large-scale data visualization; low-dimensional space; manifold structure; microbiome data visualization; novel stochastic neighbor embedding approach; probabilistic distance; statistical visualization methods; Cost function; Data visualization; Laplace equations; Linear programming; Principal component analysis; Probabilistic logic; Stochastic processes; Data visualization; Dimension reduction; Laplacian distribution; Laplacian regularization; Microbiome;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location :
Belfast
DOI :
10.1109/BIBM.2014.6999294