Title :
Missing Value Estimation for Time Series Microarray Data Using Linear Dynamical Systems Modeling
Author :
Phong, Connie ; Singh, Raul
Author_Institution :
Univ. of California, San Francisco
Abstract :
The analysis of gene expression time series obtained from microarray experiments can be effectively exploited to understand a wide range of biological phenomena from the homeostatic dynamics of cell cycle systems to the response of key genes to the onset of cancer or infectious disease. However, microarray data frequently contain a significant number of missing values making the application of common multivariate analysis methods, all of which require complete expression matrices, difficult. In order to preserve the experimentally expensive non-missing data points in time series gene expression data, methods are needed to estimate the missing values in such a way that preserves the latent interdependencies among time points within individual expression profiles. Thus we propose modeling gene expression profiles as simple linear and Gaussian dynamical systems and apply the Kalman filter to estimate missing values. While other current advanced estimation methods are either sensitive to parameters with no theoretical means of selection or attempt to learn statically from inherently dynamical data, our approach is advantageous exactly because it makes minimal assumptions that are consistent with the biology. We demonstrate the efficiency of our approach by evaluating its performance in estimating artificially introduced missing values in two different time series data sets, and compare it to a Bayesian approach dependent on the eigenvectors of the gene expression matrix as well as a gene wise average imputation for missing values.
Keywords :
Kalman filters; biology computing; cancer; genetics; time series; Gaussian dynamical systems; Kalman filter; advanced estimation; artificially introduced missing value estimation; biological phenomena; cancer; cell cycle systems; complete expression matrices; dynamical data; gene expression time series; homeostatic dynamics; infectious disease; linear dynamical systems modeling; microarray experiments; modeling gene expression profiles; multivariate analysis; nonmissing data points; time series data sets; time series gene expression data; time series microarray data; Application software; Bayesian methods; Bioinformatics; Covariance matrix; Diseases; Gene expression; Genomics; Modeling; Principal component analysis; Proteins; Dynamical systems; Kalman Filtering; Microarrays; Missing value estimation; Time Series;
Conference_Titel :
Advanced Information Networking and Applications - Workshops, 2008. AINAW 2008. 22nd International Conference on
Conference_Location :
Okinawa
Print_ISBN :
978-0-7695-3096-3
DOI :
10.1109/WAINA.2008.23