DocumentCode
19343
Title
SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets
Author
Araujo, Samur ; Duc Thanh Tran ; de Vries, Arjen P. ; Schwabe, Daniel
Author_Institution
Tech. Univ. of Delft, Delft, Netherlands
Volume
27
Issue
5
fYear
2015
fDate
May 1 2015
Firstpage
1397
Lastpage
1440
Abstract
State-of-the-art instance matching approaches do not perform well when used for matching instances across heterogeneous datasets. This shortcoming derives from their core operation depending on direct matching, which involves a direct comparison of instances in the source with instances in the target dataset. Direct matching is not suitable when the overlap between the datasets is small. Aiming at resolving this problem, we propose a new paradigm called class-based matching. Given a class of instances from the source dataset, called the class of interest, and a set of candidate matches retrieved from the target, class-based matching refines the candidates by filtering out those that do not belong to the class of interest. For this refinement, only data in the target is used, i.e., no direct comparison between source and target is involved. Based on extensive experiments using public benchmarks, we show our approach greatly improves the quality of state-of-the-art systems; especially on difficult matching tasks.
Keywords
data integration; distributed databases; pattern matching; semantic Web; SERIMI; class-based matching; class-of-interest; core operation; data integration; heterogeneous datasets; instance matching; public benchmarks; semantic Web; source dataset; target dataset; Approximation methods; Benchmark testing; Complexity theory; Data models; Resource description framework; Semantics; Standards; Class-Based matching; Data integration; Direct matching; Instance matching; Semantic Web; class-based matching; direct matching; instance matching; semantic web;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2014.2365779
Filename
6940278
Link To Document