DocumentCode :
3142571
Title :
Efficient entity resolution methods for heterogeneous information spaces
Author :
Papadakis, George ; Nejdl, Wolfgang
Author_Institution :
L3S Res. Center, Leibniz Univ. Hannover, Hannover, Germany
fYear :
2011
fDate :
11-16 April 2011
Firstpage :
304
Lastpage :
307
Abstract :
The Web of Data encompasses a voluminous, yet constantly expanding collection of structured and semi-structured data sets. An important prerequisite for leveraging on them is the detection (and merge) of information that describe the same real-world entities, a task known as Entity Resolution. To enhance the efficiency of this quadratic task, blocking techniques are typically employed. They are, however, inapplicable to the Web of Data, due to the noise, the loose schema binding as well as the unprecedented heterogeneity inherent in it. In the context of my thesis, I focus on developing novel blocking methods that scale up Entity Resolution within such large, noisy, and heterogeneous information spaces. At their core lies an attribute-agnostic mechanism that relies exclusively on the values of entity profiles in order to build blocks effectively. The resulting set of blocks is processed efficiently by intelligent techniques that minimize the required number of comparisons. Any combination of block building and block processing methods is possible, allowing for high flexibility of the overall approach. Initial experimental studies on large, real-world data sets have produced quite promising results.
Keywords :
data structures; semantic Web; Web of data; attribute agnostic mechanism; block processing method; entity resolution method; heterogeneous information space; information detection; intelligent technique; quadratic task; structured data set; Buildings; Couplings; Erbium; Information services; Internet; Noise; Redundancy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering Workshops (ICDEW), 2011 IEEE 27th International Conference on
Conference_Location :
Hannover
Print_ISBN :
978-1-4244-9195-7
Electronic_ISBN :
978-1-4244-9194-0
Type :
conf
DOI :
10.1109/ICDEW.2011.5767671
Filename :
5767671
Link To Document :
بازگشت