DocumentCode :
2081183
Title :
Managing uncertainty of XML schema matching
Author :
Cheng, Reynold ; Gong, Jian ; Cheung, David W.
Author_Institution :
Dept. of Comput. Sci., Univ. of Hong Kong, Hong Kong, China
fYear :
2010
fDate :
1-6 March 2010
Firstpage :
297
Lastpage :
308
Abstract :
Despite of advances in machine learning technologies, a schema matching result between two database schemas (e.g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of ¿possible mappings¿ between the schemas may be derived from the matching result. In this paper, we study the problem of managing possible mappings between two heterogeneous XML schemas. We observe that for XML schemas, their possible mappings have a high degree of overlap. We hence propose a novel data structure, called the block tree, to capture the commonalities among possible mappings. The block tree is useful for representing the possible mappings in a compact manner, and can be generated efficiently. Moreover, it supports the evaluation of probabilistic twig query (PTQ), which returns the probability of portions of an XML document that match the query pattern. For users who are interested only in answers with k-highest probabilities, we also propose the top-k PTQ, and present an efficient solution for it. The second challenge we have tackled is to efficiently generate possible mappings for a given schema matching. While this problem can be solved by existing algorithms, we show how to improve the performance of the solution by using a divide-and-conquer approach. An extensive evaluation on realistic datasets show that our approaches significantly improve the efficiency of generating, storing, and querying possible mappings.
Keywords :
XML; data integrity; learning (artificial intelligence); probability; COMA++; PTQ; XML schema matching; block tree; k-highest probabilities; machine learning technologies; probabilistic twig query; querying possible mappings; Catalogs; Companies; Computer science; Databases; Machine learning; Pattern matching; Technology management; Tree data structures; Uncertainty; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2010 IEEE 26th International Conference on
Conference_Location :
Long Beach, CA
Print_ISBN :
978-1-4244-5445-7
Electronic_ISBN :
978-1-4244-5444-0
Type :
conf
DOI :
10.1109/ICDE.2010.5447868
Filename :
5447868
Link To Document :
بازگشت