• DocumentCode
    79847
  • Title

    Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach

  • Author

    Muxuan Liang ; Zhizhong Li ; Ting Chen ; Jianyang Zeng

  • Author_Institution
    Dept. of Math. Sci., Tsinghua Univ., Beijing, China
  • Volume
    12
  • Issue
    4
  • fYear
    2015
  • fDate
    July-Aug. 1 2015
  • Firstpage
    928
  • Lastpage
    937
  • Abstract
    Identification of cancer subtypes plays an important role in revealing useful insights into disease pathogenesis and advancing personalized therapy. The recent development of high-throughput sequencing technologies has enabled the rapid collection of multi-platform genomic data (e.g., gene expression, miRNA expression, and DNA methylation) for the same set of tumor samples. Although numerous integrative clustering approaches have been developed to analyze cancer data, few of them are particularly designed to exploit both deep intrinsic statistical properties of each input modality and complex cross-modality correlations among multi-platform input data. In this paper, we propose a new machine learning model, called multimodal deep belief network (DBN), to cluster cancer patients from multi-platform observation data. In our integrative clustering framework, relationships among inherent features of each single modality are first encoded into multiple layers of hidden variables, and then a joint latent model is employed to fuse common features derived from multiple input modalities. A practical learning algorithm, called contrastive divergence (CD), is applied to infer the parameters of our multimodal DBN model in an unsupervised manner. Tests on two available cancer datasets show that our integrative data analysis approach can effectively extract a unified representation of latent features to capture both intra- and cross-modality correlations, and identify meaningful disease subtypes from multi-platform cancer data. In addition, our approach can identify key genes and miRNAs that may play distinct roles in the pathogenesis of different cancer subtypes. Among those key miRNAs, we found that the expression level of miR-29a is highly correlated with survival time in ovarian cancer patients. These results indicate that our multimodal DBN based data analysis approach may have practical applications in cancer pathogenesis studies and provide useful guidelines for personali- ed cancer therapy.
  • Keywords
    RNA; belief networks; cancer; data analysis; feature extraction; genetics; genomics; medical computing; molecular biophysics; pattern clustering; tumours; unsupervised learning; DNA methylation; advancing personalized therapy; cancer data analysis; cancer pathogenesis; cancer patient clustering; cancer subtype identification; complex cross-modality correlations; contrastive divergence; cross-modality correlations; disease pathogenesis; gene expression; high-throughput sequencing technologies; input modality; integrative clustering approaches; integrative data analysis; integrative data analysis approach; intramodality correlations; intrinsic statistical properties; joint latent model; key genes; latent feature extraction; machine learning model; miR-29a; miRNA expression; multimodal DBN based data analysis; multimodal DBN model; multimodal deep belief network; multimodal deep learning approach; multiplatform cancer data; multiplatform genomic data; multiple input modalities; ovarian cancer patients; personalized cancer therapy; practical learning algorithm; tumor samples; unsupervised manner; Bioinformatics; Cancer; Computational biology; DNA; Data analysis; Data models; Genomics; Multi-platform cancer data analysis; clinical data; genomic data; identification of cancer subtypes; multimodal deep belief network; restricted Boltzmann machine;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2377729
  • Filename
    6977954