DocumentCode
1873294
Title
An Integrative DTW-based imputation method for gene expression time series data
Author
Kostadinova, Elena ; Boeva, Veselka ; Boneva, Liliana ; Tsiporkova, Elena
Author_Institution
Comput. Syst. & Technol. Dept., Tech. Univ. of Sofia, Plovdiv, Bulgaria
fYear
2012
fDate
6-8 Sept. 2012
Firstpage
258
Lastpage
263
Abstract
Gene expression microarrays are the most commonly available source of high-throughput biological data. They are widely employed for studying many different aspects of gene regulation and function, ranging from understanding the global cell-cycle control of microorganisms to cancer in humans. Gene expression microarray experiments often generate data sets with multiple missing values. Many algorithms for gene expression data analysis require a complete data matrix and therefore, the accurate estimation of missing entries is crucial for their optimal usage. The latter has driven the development of various microarray imputation methods. However, most of these approaches are not particularly suitable for time series expression profiles. Moreover, their performance is not satisfactory for datasets with high rates of missing data or small numbers of samples. Another drawback of all these methods is that their estimation is based solely on a single expression matrix and no other additional data sources to impute the missing entries are used. Motivated by these, we propose herein an imputation algorithm that is particularly suited for the estimation of missing values in gene expression time series data using information that is contained in multiple related data sets. The proposed algorithm initially identifies an appropriate set of estimation matrices by using the Dynamic Time Warping (DTW) distance in order to measure similarities between gene expression matrices. Next it employs the same distance measure to evaluate the similarity between gene expression profiles and further applies a hybrid aggregation algorithm to combine the inter-gene similarities across the selected matrices in order to identify estimation genes. Then the expression profiles of those estimation genes are used to obtain the final imputation. The estimation accuracy of the proposed algorithm, called Integrative DTW-based Imputation (IDTWimpute), is benchmarked against that of two other imputation met- ods (KNNimpute and DTWimpute) in terms of root mean squared difference. In addition, the impact of the three methods on the quality of gene clustering is evaluated by using k-means and k-medoids clustering algorithms and two different cluster validation measures.
Keywords
bioinformatics; data analysis; lab-on-a-chip; pattern clustering; time series; biological data; cluster validation measures; data analysis; data matrix; dynamic time warping distance; gene expression; gene expression matrices; gene regulation; global cell cycle control; hybrid aggregation algorithm; imputation algorithm; integrative DTW based Imputation Method; microarray imputation methods; microarrays are; root mean squared difference; single expression matrix; time series data; Clustering algorithms; Estimation; Gene expression; Heuristic algorithms; Partitioning algorithms; Time series analysis; DTW distance; data integration; gene clustering; microarray gene expression data; missing value estimation;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Systems (IS), 2012 6th IEEE International Conference
Conference_Location
Sofia
Print_ISBN
978-1-4673-2276-8
Type
conf
DOI
10.1109/IS.2012.6335145
Filename
6335145
Link To Document