Title :
A Resampling Based Clustering Algorithm for Replicated Gene Expression Data
Author :
Han Li ; Chun Li ; Xiaodan Fan
Author_Institution :
Dept. of Stat., Shenzhen Univ., Shenzhen, China
Abstract :
In gene expression data analysis, clustering is a fruitful exploratory technique to reveal the underlying molecular mechanism by identifying groups of co-expressed genes. To reduce the noise, usually multiple experimental replicates are performed. An integrative analysis of the full replicate data, instead of reducing the data to the mean profile, carries the promise of yielding more precise and robust clusters. In this paper, we propose a novel resampling based clustering algorithm for genes with replicated expression measurements. Assuming those replicates are exchangeable, we formulate the problem in the bootstrap framework, and aim to infer the consensus clustering based on the bootstrap samples of replicates. In our approach, we adopt the mixed effect model to accommodate the heterogeneous variances and implement a quasi-MCMC algorithm to conduct statistical inference. Experiments demonstrate that by taking advantage of the full replicate data, our algorithm produces more reliable clusters and has robust performance in diverse scenarios, especially when the data is subject to multiple sources of variance.
Keywords :
bootstrapping; genetics; inference mechanisms; statistical analysis; bootstrap samples; coexpressed genes; integrative analysis; molecular mechanism; noise reduction; quasiMCMC algorithm; replicated gene expression data; resampling based clustering algorithm; statistical inference; Algorithm design and analysis; Bioinformatics; Clustering algorithms; Computational biology; Data models; Genomics; Gene clustering; gene clustering; integrative analysis; mixed effect model; replicated microarray data;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2015.2403320