DocumentCode :
2831023
Title :
An Entropy-Based Characterization of the Heterogeneity of XML Collections
Author :
Sanz, Ismael ; Mesiti, Marco ; Guerrini, Giovanna ; Berlanga, Rafael
Author_Institution :
Univ. Jaume I, Castellon de la Plana
fYear :
2008
fDate :
1-5 Sept. 2008
Firstpage :
238
Lastpage :
242
Abstract :
The concept of heterogeneity is very important in XML data management, since many common applications must deal with large and complex collections which do not conform to a schema. Heterogeneity in XML collections can be present at many different levels (textual and structural) and needs to be addressed from several perspectives. This paper contributes a formal characterization of heterogeneity in XML collections based on information-theoretic considerations. We show how it can be applied in some important use cases, and we demonstrate its effectiveness by using it to analyze a number of relevant XML collections and retrieval approaches found in the literature. We show that a large space of highly heterogeneous collections has not been adequately addressed by these approaches.
Keywords :
XML; distributed processing; entropy; XML collection heterogeneity; XML collections; XML data management; entropy-based characterization; formal characterization; information-theoretic considerations; relevant XML collection retrieval; Books; Conference management; Databases; Diversity reception; Entropy; Expert systems; Humans; Random variables; Vocabulary; XML; XML; collection characterization; entropy; heterogeneous data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Application, 2008. DEXA '08. 19th International Workshop on
Conference_Location :
Turin
ISSN :
1529-4188
Print_ISBN :
978-0-7695-3299-8
Type :
conf
DOI :
10.1109/DEXA.2008.55
Filename :
4624722
Link To Document :
بازگشت