DocumentCode :
2742371
Title :
Lempel-Ziv compression of structured text
Author :
Adiego, Joaquin ; Navarro, Gonzalo ; de la Fuente, P.
Author_Institution :
Dpto. de Informatica, Valladolid Univ., Spain
fYear :
2004
fDate :
23-25 March 2004
Firstpage :
112
Lastpage :
121
Abstract :
We describe a novel Lempel-Ziv approach suitable for compressing structured documents, called LZCS, which takes advantage of redundant information that can appear in the structure. The main idea is that frequently repeated subtrees may exist and these can be replaced by a backward reference to their first occurrence. The main advantage is that compressed documents generated by LZCS are easy to display, access at random, and navigate. In a second stage, processed documents can be further compressed using some semiadaptive technique, so that random access and navigability remain possible. LZCS is especially efficient to compress collections of highly structured data, such as XML forms, invoices, e-commerce and web-service exchange documents. The comparison against structure-based and standard compressors shows that LZCS is a competitive choice for this type of documents, while the others are not well-suited to support navigation or random access.
Keywords :
XML; data compression; document handling; text analysis; LZCS; Lempel-Ziv compression; XML data; backward reference; document processing; navigation; random access; redundant information; semiadaptive technique; structured text documents; Data compression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 2004. Proceedings. DCC 2004
ISSN :
1068-0314
Print_ISBN :
0-7695-2082-0
Type :
conf
DOI :
10.1109/DCC.2004.1281456
Filename :
1281456
Link To Document :
بازگشت