• DocumentCode
    2708200
  • Title

    AXECHOP: a grammar-based compressor for XML

  • Author

    Leighton, Gregory ; Diamond, Jim ; Müldner, Tomasz

  • Author_Institution
    Jodrey Sch. of Comput. Sci., Acadia Univ., Wolfville, NS, Canada
  • fYear
    2005
  • fDate
    29-31 March 2005
  • Firstpage
    467
  • Abstract
    Summary form only given. XML is gaining widespread acceptance as a standard for storing and transmitting structured data. One of the drawbacks of XML is that it is quite verbose: an XML representation of a set of data can easily be ten times as large as a more economical representation of the data. To overcome this limitation, we present a compression scheme tailored specifically to XML named AXECHOP. The compression strategy used in AXECHOP begins by dividing the source XML document into structural and data segments. The former is represented using a byte tokenization scheme that preserves the original structure of the document (i.e. it maintains the proper nesting and ordering of elements, attributes, and data values). The MPM compression algorithm is used to generate a context-free grammar capable of deriving this original structure, and the grammar is passed through an adaptive arithmetic coder before being written to the compressed file. The document´s data is organized into a series of containers (where container membership is determined by the identity of the XML element or attribute that encloses the data) and then the Burrows-Wheeler transform (BWT) is applied to the contents of each dictionary, with the results being appended to the compressed file.
  • Keywords
    XML; adaptive codes; arithmetic codes; context-free grammars; data compression; data structures; transforms; AXECHOP; Burrows-Wheeler transform; MPM compression algorithm; XML representation; adaptive arithmetic coder; byte tokenization scheme; container membership; context-free grammar; data segments; document structural segments; grammar-based compressor; structured data; Arithmetic; Compression algorithms; Computer science; Containers; Data compression; Dictionaries; Length measurement; Testing; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 2005. Proceedings. DCC 2005
  • ISSN
    1068-0314
  • Print_ISBN
    0-7695-2309-9
  • Type

    conf

  • DOI
    10.1109/DCC.2005.20
  • Filename
    1402224