Data Fusion in Metabolomics Using Coupled Matrix and Tensor Factorizations

Author

Acar, Evrim ; Bro, Rasmus ; Smilde, Age K.

Author_Institution

Dept. of Food Sci., Univ. of Copenhagen, Frederiksberg, Denmark

Volume

103

Issue

fYear

2015

Firstpage

1602

Lastpage

1620

Abstract

With a goal of identifying biomarkers/patterns related to certain conditions or diseases, metabolomics focuses on the detection of chemical substances in biological samples such as urine and blood using a number of analytical techniques, including nuclear magnetic resonance (NMR) spectroscopy, liquid chromatography-mass spectrometry (LC-MS), and fluorescence spectroscopy. Data sets measured using these methods provide partly complementary information, and their joint analysis has the potential to reveal underlying structures, which are, otherwise, difficult to extract. While we can collect vast amounts of data using different analytical methods, data fusion remains a challenging task, in particular, when the goal is to capture the underlying factors and use them for interpretation, e.g., for biomarker identification. Furthermore, many data fusion applications require joint analysis of heterogeneous (i.e., in the form of higher order tensors and matrices) data sets with shared/unshared factors. In order to jointly analyze such heterogeneous data sets, we formulate data fusion as a coupled matrix and tensor factorization (CMTF) problem, which has already proved useful in many data mining applications, and discuss its extension to a structure-revealing data fusion model, i.e., a data fusion model that can identify shared and unshared factors. The traditional methods commonly used for data fusion in the presence of shared/unshared factors are matrix factorization-based methods. Using both simulations and prototypical experimental coupled data sets, we assess the performance of various state-of-the-art data fusion methods and demonstrate that while matrix factorization-based approaches have limitations when used for joint analysis of heterogeneous data sets, the structure-revealing CMTF model can successfully capture the underlying factors by exploiting the low-rank structure of higher order data sets.

Keywords

NMR spectroscopy; biochemistry; biological techniques; biology computing; biomedical engineering; chromatography; data analysis; fluorescence spectroscopy; mass spectroscopic chemical analysis; matrix decomposition; medical computing; numerical analysis; sensor fusion; spectrochemical analysis; tensors; CMTF problem; LC-MS; NMR spectroscopy; biomarker identification; chemical substance detection; coupled matrix and tensor factorizations; fluorescence spectroscopy; heterogeneous data sets; higher order matrices; higher order tensors; joint data analysis; liquid chromatography-mass spectrometry; metabolomics; nuclear magnetic resonance spectroscopy; structure revealing CMTF model; structure revealing data fusion model; unshared factors; Analytical models; Brain modeling; Data integration; Data models; Metabolomics; Multimodal sensors; Tensile stress; Data fusion; matrix factorizations; metabolomics; tensor factorizations;

fLanguage

English

Journal_Title

Proceedings of the IEEE

Publisher

ieee

ISSN

0018-9219

Type

jour

DOI

10.1109/JPROC.2015.2438719

Filename

7202834

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=740193