DocumentCode :
3168742
Title :
Coreference detection in XML metadata
Author :
Szymczak, Marcin ; Zadrozny, Slawomir ; De Tre, Guy
Author_Institution :
Syst. Res. Inst., Warsaw, Poland
fYear :
2013
fDate :
24-28 June 2013
Firstpage :
1354
Lastpage :
1359
Abstract :
Preserving data quality is an important issue in data collection management. One of the crucial issues hereby is the detection of duplicate objects (called coreferent objects) which describe the same entity, but in different ways. In this paper we present a method for detecting coreferent objects in metadata, in particular in XML schemas. Our approach consists in comparing the paths from a root element to a given element in the schema. Each path precisely defines the context and location of a specific element in the schema. Path matching is based on the comparison of the different steps of which paths are composed. The uncertainty about the matching of steps is expressed with possibilistic truth values and aggregated using the Sugeno integral. The discovered coreference of paths can help for determining the coreference of different XML schemas.
Keywords :
XML; data acquisition; fuzzy set theory; integral equations; meta data; pattern matching; Sugeno integral; XML metadata; XML schemas; coreferent object detection; data collection management; data quality preservation; duplicate object detection; path matching; possibilistic truth values; Context; Databases; Educational institutions; Lattices; Q measurement; Uncertainty; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), 2013 Joint
Conference_Location :
Edmonton, AB
Type :
conf
DOI :
10.1109/IFSA-NAFIPS.2013.6608598
Filename :
6608598
Link To Document :
بازگشت