Title :
Using partial-matching approach with Sequitur for contact-based coding
Author :
Lakhani, Gopal ; Radhakrishnan Sethurama
Author_Institution :
Texas Tech. Univ., Lubbock, TX, USA
Abstract :
This paper presents a Sequitur, which is a powerful realization of the grammar-based text compression approach. First, it derives a context-free grammar, which generates the given input as its only language. Then it represents the grammar as a single sequence of symbols, where a symbol represents either a rule head or an alphabet, and finally it uses arithmetic coder to code the sequence. Sequitur encodes each symbol individually. A modified approach is to code a rule symbol in the context of the last alphabet of the preceding rule symbol. Thereby, the partial matching (PM) approach is followed to realize code reduction. The crux of this approach is to maintain multiple dictionaries, each containing rules that begin with the same alphabet.
Keywords :
arithmetic codes; context-free grammars; data compression; sequential codes; text analysis; Sequitur; Sequitur encoding; arithmetic coder; code reduction; contact-based coding; context-free grammar; grammar-based text compression approach; partial-matching approach; preceding rule symbol; rule head; sequence code; symbol representation; symbol sequence; text dictionary; Arithmetic; Data compression; Decoding; Dictionaries;
Conference_Titel :
Data Compression Conference, 2004. Proceedings. DCC 2004
Print_ISBN :
0-7695-2082-0
DOI :
10.1109/DCC.2004.1281524