DocumentCode :
3421255
Title :
XML syntax conscious compression
Author :
Harrusi, S. ; Averbuch, A. ; Yehudai, A.
Author_Institution :
Sch. of Comput. Sci., Tel Aviv Univ.
fYear :
2006
fDate :
28-30 March 2006
Lastpage :
411
Abstract :
XML is the standard format of content representation and sharing on the Web. XML is a highly verbose language, especially regarding the duplication of meta-data in the form of elements and attributes. As XML content is becoming more widespread so is the demand to compress XML data volume. The paper presents the best XML compression ratios reported to date. Its advantage over other XML compression techniques is that it uses syntactic information to enhance compression. Therefore, it is a fully syntactic based XML compression. The syntactic information is parsed from XML documents by an innovative XML parser. We developed a new XML parser-generator for that purpose. Our parser-generator is based on a syntactic dictionary (DTD, XML-Schema, etc.) of the XML in order to create an efficient and compact XML parsers. This XML parser-generator is adopted to streaming technologies and can be used in a wide variety of XML applications such as validators, converters, gateways, routers, browsers editors etc. The parsers´ symbols are encoded by a partial prediction matching (PPM) codec. We compare between the performance of our algorithm and other existing XML compression techniques. The proposed compression algorithm achieves better compression ratio in comparison to other XML compression techniques that do not utilize syntactic structure. The superiority of our compression technique is more evident when it is tested on XML data sets that contain only tags and not free text
Keywords :
XML; computational linguistics; data compression; grammars; meta data; XML data sets; XML syntax conscious compression; compression enhancement; metadata duplication; parser-generator; partial prediction matching codec; syntactic dictionary; verbose language; Codecs; Compression algorithms; Computer science; Data compression; Dictionaries; Production; Tagging; Testing; Uniform resource locators; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 2006. DCC 2006. Proceedings
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
0-7695-2545-8
Type :
conf
DOI :
10.1109/DCC.2006.85
Filename :
1607275
Link To Document :
بازگشت