Title :
A tree based binary encoding of text using LZW algorithm
Author :
Acharya, Tinku ; Mukherjee, Amar
Author_Institution :
Inst. of Syst. Res., Maryland Univ., College Park, MD, USA
Abstract :
Summary form only given. The most popular adaptive dictionary coding scheme used for text compression is the LZW algorithm. In the LZW algorithm, a changing dictionary contains common strings that have been encountered so far in the text. The dictionary can be represented by a dynamic trie. The input text is examined character by character and the longest substring (called a prefix string) of the text which already exists in the trie, is replaced by a pointer to a node in the trie which represents the prefix string. Motivation of our research is to explore a variation of the LZW algorithm for variable-length binary encoding of text (we call it the LZWA algorithm) and to develop a memory-based VLSI architecture for text compression. We proposed a new methodology to represent the trie in the form of a binary tree (we call it a binary trie) to maintain the dictionary used in the LZW scheme. This binary tree maintains all the properties of the trie and can easily be mapped into memory. As a result, the common substrings can be encoded using variable length prefix binary codes. The prefix codes enable us to uniquely decode the text in its original form. The algorithm outperforms the usual LZW scheme when the size of the text is small (usually less than 5 K). Depending upon the characteristics of the text, the improvement of the compression ratio has been achieved around 10-30% compared to the LZW scheme. But its performance degrades for larger size texts
Keywords :
adaptive codes; algorithm theory; binary sequences; data compression; glossaries; tree data structures; variable length codes; word processing; LZW algorithm; LZWA algorithm; adaptive dictionary coding; binary tree; binary trie; compression ratio; dynamic trie; performance; prefix string; text coding; text compression; text decoding; tree based binary encoding; variable length prefix binary codes; variable-length binary encoding; Binary trees; Computer science; Data compression; Decoding; Degradation; Dictionaries; Educational institutions; Encoding; Memory architecture; Very large scale integration;
Conference_Titel :
Data Compression Conference, 1995. DCC '95. Proceedings
Conference_Location :
Snowbird, UT
Print_ISBN :
0-8186-7012-6
DOI :
10.1109/DCC.1995.515573