Title :
An Integrative DTW-based imputation method for gene expression time series data
Author :
Kostadinova, Elena ; Boeva, Veselka ; Boneva, Liliana ; Tsiporkova, Elena
Author_Institution :
Comput. Syst. & Technol. Dept., Tech. Univ. of Sofia, Plovdiv, Bulgaria
Abstract :
Gene expression microarrays are the most commonly available source of high-throughput biological data. They are widely employed for studying many different aspects of gene regulation and function, ranging from understanding the global cell-cycle control of microorganisms to cancer in humans. Gene expression microarray experiments often generate data sets with multiple missing values. Many algorithms for gene expression data analysis require a complete data matrix and therefore, the accurate estimation of missing entries is crucial for their optimal usage. The latter has driven the development of various microarray imputation methods. However, most of these approaches are not particularly suitable for time series expression profiles. Moreover, their performance is not satisfactory for datasets with high rates of missing data or small numbers of samples. Another drawback of all these methods is that their estimation is based solely on a single expression matrix and no other additional data sources to impute the missing entries are used. Motivated by these, we propose herein an imputation algorithm that is particularly suited for the estimation of missing values in gene expression time series data using information that is contained in multiple related data sets. The proposed algorithm initially identifies an appropriate set of estimation matrices by using the Dynamic Time Warping (DTW) distance in order to measure similarities between gene expression matrices. Next it employs the same distance measure to evaluate the similarity between gene expression profiles and further applies a hybrid aggregation algorithm to combine the inter-gene similarities across the selected matrices in order to identify estimation genes. Then the expression profiles of those estimation genes are used to obtain the final imputation. The estimation accuracy of the proposed algorithm, called Integrative DTW-based Imputation (IDTWimpute), is benchmarked against that of two other imputation met- ods (KNNimpute and DTWimpute) in terms of root mean squared difference. In addition, the impact of the three methods on the quality of gene clustering is evaluated by using k-means and k-medoids clustering algorithms and two different cluster validation measures.
Keywords :
bioinformatics; data analysis; lab-on-a-chip; pattern clustering; time series; biological data; cluster validation measures; data analysis; data matrix; dynamic time warping distance; gene expression; gene expression matrices; gene regulation; global cell cycle control; hybrid aggregation algorithm; imputation algorithm; integrative DTW based Imputation Method; microarray imputation methods; microarrays are; root mean squared difference; single expression matrix; time series data; Clustering algorithms; Estimation; Gene expression; Heuristic algorithms; Partitioning algorithms; Time series analysis; DTW distance; data integration; gene clustering; microarray gene expression data; missing value estimation;
Conference_Titel :
Intelligent Systems (IS), 2012 6th IEEE International Conference
Conference_Location :
Sofia
Print_ISBN :
978-1-4673-2276-8
DOI :
10.1109/IS.2012.6335145