DocumentCode :
2291173
Title :
Exploiting Structure Recurrence in XML Processing
Author :
Zhou, Dong
Author_Institution :
DoCoMo USA Labs., San Jose, CA
fYear :
2008
fDate :
14-18 July 2008
Firstpage :
311
Lastpage :
324
Abstract :
Transmitting, parsing, and transforming XML documents (and messages) are particularly costly in cellular environments because of the limitations in handset and access network capabilities. A big part of XML processing cost is caused by the processing of the structure of the documents. By structure we refer to the entity resulted from an XML document after removing the text nodes and attribute values of the document. While XML is a flexible, extensible language, real-world data exchanged in XML often exhibit some degree of stability in its organization. In other words, a computer receiving an XML data item of certain structure is likely to encounter the same structure among future data items. Since most structure-related processing is identical for data items with identical structure, it is thus evident that the overall performance of XML processing will improve if redundancy in structure related processing can be reduced. In this paper we present the concept of structure encoding and the approaches to quickly identifying recurring structures, including one relying on collision-resistant hash function. The paper then describes in detail techniques to improving the performance of XML transmission, tokenization, parsing, and transformation by using structure encoding. Evaluation experiments with our prototype implementation and industry benchmark suite demonstrate huge performance improvement potential in the presence of structure recurrence: up to 7 times faster for DOM-style parsing, up to 38 times faster for transformation, and up to 97.4% in size reduction when 20% of the text and attribute values change. In the worst case when there is no structure recurrence, structure encoding causes an overhead of about 11.1% for DOM-style parsing and about 8.9% for transformation.
Keywords :
XML; cryptography; document handling; encoding; grammars; DOM-style parsing; XML documents parsing; XML documents transformation; XML documents transmission; XML tokenization; access network capabilities; cellular environments; collision-resistant hash function; structure encoding; structure recurrence; Bandwidth; Costs; Data processing; Delay; Encoding; Mobile handsets; Prototypes; Stability; Web services; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Engineering, 2008. ICWE '08. Eighth International Conference on
Conference_Location :
Yorktown Heights, NJ
Print_ISBN :
978-0-7695-3261-5
Electronic_ISBN :
978-0-7695-3261-5
Type :
conf
DOI :
10.1109/ICWE.2008.46
Filename :
4577894
Link To Document :
بازگشت