• DocumentCode
    1873294
  • Title

    An Integrative DTW-based imputation method for gene expression time series data

  • Author

    Kostadinova, Elena ; Boeva, Veselka ; Boneva, Liliana ; Tsiporkova, Elena

  • Author_Institution
    Comput. Syst. & Technol. Dept., Tech. Univ. of Sofia, Plovdiv, Bulgaria
  • fYear
    2012
  • fDate
    6-8 Sept. 2012
  • Firstpage
    258
  • Lastpage
    263
  • Abstract
    Gene expression microarrays are the most commonly available source of high-throughput biological data. They are widely employed for studying many different aspects of gene regulation and function, ranging from understanding the global cell-cycle control of microorganisms to cancer in humans. Gene expression microarray experiments often generate data sets with multiple missing values. Many algorithms for gene expression data analysis require a complete data matrix and therefore, the accurate estimation of missing entries is crucial for their optimal usage. The latter has driven the development of various microarray imputation methods. However, most of these approaches are not particularly suitable for time series expression profiles. Moreover, their performance is not satisfactory for datasets with high rates of missing data or small numbers of samples. Another drawback of all these methods is that their estimation is based solely on a single expression matrix and no other additional data sources to impute the missing entries are used. Motivated by these, we propose herein an imputation algorithm that is particularly suited for the estimation of missing values in gene expression time series data using information that is contained in multiple related data sets. The proposed algorithm initially identifies an appropriate set of estimation matrices by using the Dynamic Time Warping (DTW) distance in order to measure similarities between gene expression matrices. Next it employs the same distance measure to evaluate the similarity between gene expression profiles and further applies a hybrid aggregation algorithm to combine the inter-gene similarities across the selected matrices in order to identify estimation genes. Then the expression profiles of those estimation genes are used to obtain the final imputation. The estimation accuracy of the proposed algorithm, called Integrative DTW-based Imputation (IDTWimpute), is benchmarked against that of two other imputation met- ods (KNNimpute and DTWimpute) in terms of root mean squared difference. In addition, the impact of the three methods on the quality of gene clustering is evaluated by using k-means and k-medoids clustering algorithms and two different cluster validation measures.
  • Keywords
    bioinformatics; data analysis; lab-on-a-chip; pattern clustering; time series; biological data; cluster validation measures; data analysis; data matrix; dynamic time warping distance; gene expression; gene expression matrices; gene regulation; global cell cycle control; hybrid aggregation algorithm; imputation algorithm; integrative DTW based Imputation Method; microarray imputation methods; microarrays are; root mean squared difference; single expression matrix; time series data; Clustering algorithms; Estimation; Gene expression; Heuristic algorithms; Partitioning algorithms; Time series analysis; DTW distance; data integration; gene clustering; microarray gene expression data; missing value estimation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems (IS), 2012 6th IEEE International Conference
  • Conference_Location
    Sofia
  • Print_ISBN
    978-1-4673-2276-8
  • Type

    conf

  • DOI
    10.1109/IS.2012.6335145
  • Filename
    6335145