DocumentCode :
741235
Title :
Methodologies for Cross-Domain Data Fusion: An Overview
Author :
Zheng, Yu
Author_Institution :
, Microsoft Research, Beijing, China
Volume :
1
Issue :
1
fYear :
2015
Firstpage :
16
Lastpage :
34
Abstract :
Traditional data mining usually deals with data from a single domain. In the big data era, we face a diversity of datasets from different sources in different domains. These datasets consist of multiple modalities, each of which has a different representation, distribution, scale, and density. How to unlock the power of knowledge from multiple disparate (but potentially connected) datasets is paramount in big data research, essentially distinguishing big data from traditional data mining tasks. This calls for advanced techniques that can fuse knowledge from various datasets organically in a machine learning and data mining task. This paper summarizes the data fusion methodologies, classifying them into three categories: stage-based, feature level-based, and semantic meaning-based data fusion methods. The last category of data fusion methods is further divided into four groups: multi-view learning-based, similarity-based, probabilistic dependency-based, and transfer learning-based methods. These methods focus on knowledge fusion rather than schema mapping and data merging, significantly distinguishing between cross-domain data fusion and traditional data fusion studied in the database community. This paper does not only introduce high-level principles of each category of methods, but also give examples in which these techniques are used to handle real big data problems. In addition, this paper positions existing works in a framework, exploring the relationship and difference between different data fusion methods. This paper will help a wide range of communities find a solution for data fusion in big data projects.
Keywords :
Big data; Data integration; Data mining; Feature extraction; Roads; Semantics; Trajectory; Big Data; Big data; cross-domain data mining; data fusion; deep neural networks; matrix factorization; multi-modality data representation; multi-view learning; probabilistic graphical models; transfer learning; urban computing;
fLanguage :
English
Journal_Title :
Big Data, IEEE Transactions on
Publisher :
ieee
Type :
jour
DOI :
10.1109/TBDATA.2015.2465959
Filename :
7230259
Link To Document :
بازگشت