DocumentCode :
3122878
Title :
Estimating an Optimal Correlation Structure from Replicated Molecular Profiling Data Using Finite Mixture Models
Author :
Acharya, Lipi R. ; Zhu, Dongxiao
Author_Institution :
Dept. of Comput. Sci., Univ. of New Orleans, New Orleans, LA, USA
fYear :
2009
fDate :
13-15 Dec. 2009
Firstpage :
119
Lastpage :
124
Abstract :
Estimating the correlation structure of a gene set is an ubiquitous problem in many pattern analyses of replicated molecular profiling data. However, the commonly used Maximum Likelihood Estimates (MLE) approaches, do not automatically accommodate replicated measurements. Often, an ad hoc step of preprocessing e.g. averaging, either weighted, un-weighted or something in between is needed, which might wipe out important patterns of low magnitude and/or cancel out patterns of similar magnitude. We treat each replicate individually as a random variable and design a finite mixture model to estimate an optimal correlation structure from replicated molecular profiling data. Assuming that the measurements are independently, identically distributed (i.i.d.) samples from a mixture of two multivariate normal distributions, one with a constrained set of parameters and the other with an unconstrained parameter structure, we employ an Expectation-Maximization (EM) algorithm to estimate component parameters. We carry out a comparative study, including both simulations and real-world data analysis, to assess the estimation of correlation structure using the proposed model and the constrained model given by the first component of the mixture. The two models were further tested for their performances in clustering real-world data. The mixture model approach is shown to have an overall better performance.
Keywords :
biology computing; expectation-maximisation algorithm; genetics; molecular biophysics; normal distribution; parameter estimation; component parameters estimation; expectation-maximization algorithm; finite mixture models; gene set; maximum likelihood estimates; multivariate normal distributions; optimal correlation structure; pattern analysis; replicated molecular profiling data; ubiquitous problem; unconstrained parameter structure; Application software; Clustering algorithms; Computer science; Data analysis; Gaussian distribution; Machine learning; Maximum likelihood estimation; Pattern analysis; Random variables; Robustness;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications, 2009. ICMLA '09. International Conference on
Conference_Location :
Miami Beach, FL
Print_ISBN :
978-0-7695-3926-3
Type :
conf
DOI :
10.1109/ICMLA.2009.53
Filename :
5381811
Link To Document :
بازگشت