DocumentCode :
740193
Title :
Data Fusion in Metabolomics Using Coupled Matrix and Tensor Factorizations
Author :
Acar, Evrim ; Bro, Rasmus ; Smilde, Age K.
Author_Institution :
Dept. of Food Sci., Univ. of Copenhagen, Frederiksberg, Denmark
Volume :
103
Issue :
9
fYear :
2015
Firstpage :
1602
Lastpage :
1620
Abstract :
With a goal of identifying biomarkers/patterns related to certain conditions or diseases, metabolomics focuses on the detection of chemical substances in biological samples such as urine and blood using a number of analytical techniques, including nuclear magnetic resonance (NMR) spectroscopy, liquid chromatography-mass spectrometry (LC-MS), and fluorescence spectroscopy. Data sets measured using these methods provide partly complementary information, and their joint analysis has the potential to reveal underlying structures, which are, otherwise, difficult to extract. While we can collect vast amounts of data using different analytical methods, data fusion remains a challenging task, in particular, when the goal is to capture the underlying factors and use them for interpretation, e.g., for biomarker identification. Furthermore, many data fusion applications require joint analysis of heterogeneous (i.e., in the form of higher order tensors and matrices) data sets with shared/unshared factors. In order to jointly analyze such heterogeneous data sets, we formulate data fusion as a coupled matrix and tensor factorization (CMTF) problem, which has already proved useful in many data mining applications, and discuss its extension to a structure-revealing data fusion model, i.e., a data fusion model that can identify shared and unshared factors. The traditional methods commonly used for data fusion in the presence of shared/unshared factors are matrix factorization-based methods. Using both simulations and prototypical experimental coupled data sets, we assess the performance of various state-of-the-art data fusion methods and demonstrate that while matrix factorization-based approaches have limitations when used for joint analysis of heterogeneous data sets, the structure-revealing CMTF model can successfully capture the underlying factors by exploiting the low-rank structure of higher order data sets.
Keywords :
NMR spectroscopy; biochemistry; biological techniques; biology computing; biomedical engineering; chromatography; data analysis; fluorescence spectroscopy; mass spectroscopic chemical analysis; matrix decomposition; medical computing; numerical analysis; sensor fusion; spectrochemical analysis; tensors; CMTF problem; LC-MS; NMR spectroscopy; biomarker identification; chemical substance detection; coupled matrix and tensor factorizations; fluorescence spectroscopy; heterogeneous data sets; higher order matrices; higher order tensors; joint data analysis; liquid chromatography-mass spectrometry; metabolomics; nuclear magnetic resonance spectroscopy; structure revealing CMTF model; structure revealing data fusion model; unshared factors; Analytical models; Brain modeling; Data integration; Data models; Metabolomics; Multimodal sensors; Tensile stress; Data fusion; matrix factorizations; metabolomics; tensor factorizations;
fLanguage :
English
Journal_Title :
Proceedings of the IEEE
Publisher :
ieee
ISSN :
0018-9219
Type :
jour
DOI :
10.1109/JPROC.2015.2438719
Filename :
7202834
Link To Document :
بازگشت