Title :
Latent Feature Decompositions for Integrative Analysis of Multi-Platform Genomic Data
Author :
Gregory, Karl B. ; Momin, Amin A. ; Coombes, Kevin R. ; Baladandayuthapani, Veerabhadran
Author_Institution :
Dept. of Stat., Texas A&M Univ., College Station, TX, USA
Abstract :
Increased availability of multi-platform genomics data on matched samples has sparked research efforts to discover how diverse molecular features interact both within and between platforms. In addition, simultaneous measurements of genetic and epigenetic characteristics illuminate the roles their complex relationships play in disease progression and outcomes. However, integrative methods for diverse genomics data are faced with the challenges of ultra-high dimensionality and the existence of complex interactions both within and between platforms. We propose a novel modeling framework for integrative analysis based on decompositions of the large number of platform-specific features into a smaller number of latent features. Subsequently we build a predictive model for clinical outcomes accounting for both within- and between-platform interactions based on Bayesian model averaging procedures. Principal components, partial least squares and non-negative matrix factorization as well as sparse counterparts of each are used to define the latent features, and the performance of these decompositions is compared both on real and simulated data. The latent feature interactions are shown to preserve interactions between the original features and not only aid prediction but also allow explicit selection of outcome-related features. The methods are motivated by and applied to a glioblastoma multiforme data set from The Cancer Genome Atlas to predict patient survival times integrating gene expression, microRNA, copy number and methylation data. For the glioblastoma data, we find a high concordance between our selected prognostic genes and genes with known associations with glioblastoma. In addition, our model discovers several relevant cross-platform interactions such as copy number variation associated gene dosing and epigenetic regulation through promoter methylation. On simulated data, we show that our proposed method successfully incorporates interactions within and between g- nomic platforms to aid accurate prediction and variable selection. Our methods perform best when principal components are used to define the latent features.
Keywords :
Bayes methods; RNA; bioinformatics; cancer; feature selection; genetics; genomics; least squares approximations; matrix decomposition; molecular biophysics; molecular configurations; principal component analysis; tumours; Bayesian model averaging procedures; Cancer Genome Atlas; between-platform interactions; complex interactions; complex relationships; copy number; copy number variation associated gene dosing; disease progression; diverse genomics data; diverse molecular features; epigenetic characteristics; epigenetic regulation; explicit selection; gene expression; genetic characteristics; glioblastoma multiforme data set; integrative analysis; integrative methods; latent feature decompositions; matched samples; methylation data; microRNA; multiplatform genomic data; nonnegative matrix factorization; novel modeling framework; outcome-related features; partial least squares; platform-specific features; principal components; promoter methylation; relevant cross-platform interactions; selected prognostic genes; sparked research efforts; ultrahigh dimensionality; variable selection; within-platform interactions; Analytical models; Bioinformatics; Biomedical signal processing; Cancer; Computational biology; Data models; Genomics; Matrix decomposition; Predictive models; Statistical analysis; Bayesian model averaging; Latent feature; genomic data; high-dimensional; integrative models; interactions;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2014.2325035