Title :
Analyzing gene expression time-courses
Author :
Schliep, Alexander ; Costa, Ivan G. ; Steinhoff, Christine ; Schönhuth, Alexander
Author_Institution :
Dept. for Comput. Biol., Max-Planck-Inst. fur Molecular Genetics, Berlin, Germany
Abstract :
Measuring gene expression over time can provide important insights into basic cellular processes. Identifying groups of genes with similar expression time-courses is a crucial first step in the analysis. As biologically relevant groups frequently overlap, due to genes having several distinct roles in those cellular processes, this is a difficult problem for classical clustering methods. We use a mixture model to circumvent this principal problem, with hidden Markov models (HMMs) as effective and flexible components. We show that the ensuing estimation problem can be addressed with additional labeled data partially supervised learning of mixtures - through a modification of the expectation-maximization (EM) algorithm. Good starting points for the mixture estimation are obtained through a modification to Bayesian model merging, which allows us to learn a collection of initial HMMs. We infer groups from mixtures with a simple information-theoretic decoding heuristic, which quantifies the level of ambiguity in group assignment. The effectiveness is shown with high-quality annotation data. As the HMMs we propose capture asynchronous behavior by design, the groups we find are also asynchronous. Synchronous subgroups are obtained from a novel algorithm based on Viterbi paths. We show the suitability of our HMM mixture approach on biological and simulated data and through the favorable comparison with previous approaches. A software implementing the method is freely available under the GPL from http://ghmm.org/gql.
Keywords :
Bayes methods; cellular biophysics; decoding; estimation theory; genetics; hidden Markov models; learning (artificial intelligence); molecular biophysics; physiological models; Bayesian model merging; Viterbi paths; cellular processes; expectation-maximization algorithm; gene expression time-courses; hidden Markov models; information-theoretic decoding heuristic; mixture estimation; partially supervised learning; Bayesian methods; Biological system modeling; Clustering algorithms; Clustering methods; Decoding; Gene expression; Hidden Markov models; Merging; Supervised learning; Time measurement; Index Terms- Mixture modeling; gene expression; hidden Markov models; partially supervised learning; time-course analysis.; Algorithms; Artificial Intelligence; Computer Simulation; Gene Expression Profiling; Markov Chains; Models, Genetic; Models, Statistical; Multigene Family; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated; Time Factors;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2005.31