DocumentCode :
3227848
Title :
Combining structural and textual contexts for compressing semistructured databases
Author :
Adiego, Joaquín ; de la Fuente, P. ; Navarro, Gonzalo
Author_Institution :
Dpto. de Informatica, Univ. de Valladolid, Spain
fYear :
2005
fDate :
26-30 Sept. 2005
Firstpage :
68
Lastpage :
73
Abstract :
We describe a compression technique for semistructured documents, called SCMPPM, which combines the prediction by partial matching technique with structural contexts model (SCM) technique. SCMPPM takes advantage of the context information usually implicit in the structure of the text. The idea is to use a separate PPM model to compress the text that lies inside each different structure type (e.g., different XML tag). The intuition is that the distribution of the texts that belong to a given structure type should be similar, and different from that of other structure types. This should allow PPM to make better predictions. We test our idea against plain PPM modelling, as well as against other structure-aware techniques. Results show that the new compression method obtains significant improvements in compression ratios.
Keywords :
data compression; database management systems; text analysis; XML tag; compression technique; partial matching; semistructured database; semistructured document; structural context model; textual context; Compressors; Context modeling; Databases; Huffman coding; Libraries; Natural languages; Predictive models; Testing; Vocabulary; XML; Compression Model; PPM; Semistructured Documents.;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science, 2005. ENC 2005. Sixth Mexican International Conference on
ISSN :
1550-4069
Print_ISBN :
0-7695-2454-0
Type :
conf
DOI :
10.1109/ENC.2005.15
Filename :
1592202
Link To Document :
بازگشت