Title of article :
2LP: A double-lazy XML parser
Author/Authors :
Fernando Farf?n، نويسنده , , Vagelis Hristidis، نويسنده , , Raju Rangaswami، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2009
Pages :
19
From page :
145
To page :
163
Abstract :
XML is acknowledged as the most effective format for data encoding and exchange over domains ranging from the World Wide Web to desktop applications. However, large-scale adoption into actual system implementations is being slowed down due to the inefficiency of its document-parsing methods. The recent development of lazy parsing techniques is a major step towards improving this situation, but lazy parsers still have a key drawback—they must load the entire XML document in order to extract the overall document structure before document parsing can be performed. We have developed a framework for efficient parsing based on the idea of placing internal physical pointers within the XML document that allow the navigation process to skip large portions of the document during parsing. We show how to generate such internal pointers in a way that optimizes parsing using constructs supported by the current W3C XML standard. A double-lazy parser (2LP) exploits these internal pointers to efficiently parse the document. The usage of supported W3C constructs to create internal pointers allows 2LP to be backward compatible—i.e., the pointer-augmented documents can be parsed by current XML parsers. We also implemented a mechanism to efficiently parse large documents with limited main memory, thereby overcoming a major limitation in current solutions. We study our pointer generation and parsing algorithms both theoretically and experimentally, and show that they perform considerably better than existing approaches.
Keywords :
optimization , Document management , DOM , trees , XML
Journal title :
Information Systems
Serial Year :
2009
Journal title :
Information Systems
Record number :
1230085
Link To Document :
بازگشت