DocumentCode :
1626721
Title :
XCluster Synopses for Structured XML Content
Author :
Polyzotis, Neoklis ; Garofalakis, Minos
Author_Institution :
University of California, Santa Cruz
fYear :
2006
Firstpage :
63
Lastpage :
63
Abstract :
We tackle the difficult problem of summarizing the path/branching structure and value content of an XML database that comprises both numeric and textual values. We introduce a novel XML-summarization model, termed XCLUSTERs, that enables accurate selectivity estimates for the class of twig queries with numeric-range, substring, and textual IR predicates over the content of XML elements. In a nutshell, an XCLUSTER synopsis represents an effective clustering of XML elements based on both their structural and value-based characteristics. By leveraging techniques for summarizing XML-document structure as well as numeric and textual data distributions, our XCLUSTER model provides the first known unified framework for handling path/branching structure and different types of element values. We detail the XCLUSTER model, and develop a systematic framework for the construction of effective XCLUSTER summaries within a specified storage budget. Experimental results on synthetic and real-life data verify the effectiveness of our XCLUSTER synopses, clearly demonstrating their ability to accurately summarize XML databases with mixed-value content. To the best of our knowledge, ours is the first work to address the summarization problem for structured XML content in its full generality.
Keywords :
Abstracts; Cost function; Data engineering; Data models; Databases; Internet; Large scale integration; Query processing; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2006. ICDE '06. Proceedings of the 22nd International Conference on
Print_ISBN :
0-7695-2570-9
Type :
conf
DOI :
10.1109/ICDE.2006.175
Filename :
1617431
Link To Document :
بازگشت