Title of article :
Clustering gene expression time course data using mixtures of multivariate t-distributions
Author/Authors :
McNicholas، نويسنده , , Paul D. and Subedi، نويسنده , , Sanjeena، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2012
Pages :
14
From page :
1114
To page :
1127
Abstract :
Clustering gene expression time course data is an important problem in bioinformatics because understanding which genes behave similarly can lead to the discovery of important biological information. Statistically, the problem of clustering time course data is a special case of the more general problem of clustering longitudinal data. In this paper, a very general and flexible model-based technique is used to cluster longitudinal data. Mixtures of multivariate t-distributions are utilized, with a linear model for the mean and a modified Cholesky-decomposed covariance structure. Constraints are placed upon the covariance structure, leading to a novel family of mixture models, including parsimonious models. In addition to model-based clustering, these models are also used for model-based classification, i.e., semi-supervised clustering. Parameters, including the component degrees of freedom, are estimated using an expectation-maximization algorithm and two different approaches to model selection are considered. The models are applied to simulated data to illustrate their efficacy; this includes a comparison with their Gaussian analogues—the use of these Gaussian analogues with a linear model for the mean is novel in itself. Our family of multivariate t mixture models is then applied to two real gene expression time course data sets and the results are discussed. We conclude with a summary, suggestions for future work, and a discussion about constraining the degrees of freedom parameter.
Keywords :
Gene expression , Time course data , Cholesky decomposition , Mixture models , Multivariate t-distributions , Model-based clustering
Journal title :
Journal of Statistical Planning and Inference
Serial Year :
2012
Journal title :
Journal of Statistical Planning and Inference
Record number :
2221857
Link To Document :
بازگشت