Title :
Printed english compression by dictionary encoding
Author_Institution :
RCA Laboratories, Princeton, N. J.
fDate :
3/1/1967 12:00:00 AM
Abstract :
The ability of a dictionary encoder to reduce the redundancy of printed English text is evaluated by simulation on a general-purpose digital computer. The dictionary encoder matches segments of the input text to entries of a stored dictionary which contains frequently occurring sequences of letters. The text is thus defined as the succession of code designations corresponding to the selected dictionary entries. Since, for a normal piece of text, fewer binary digits are needed to specify the code designations than the text itself, the encoding produces a compressed equivalent of the original input. In addition to evaluating encoder performance the simulator also collects language statistics which are used for optimization of the encoder logic and the dictionary entries. For a broad type of English language text (news dispatches prepared for newspaper publication) the number of binary digits required to represent a piece of text can be reduced by 50 percent when using a 1000-entry dictionary. While a better compression than 50 percent is theoretically possible it may be difficult to realize, but a compression of the input text to 60 to 70 percent of its original size appears to be easily realizable with a small dictionary.
Keywords :
Chromium; Computational modeling; Computer simulation; Constraint theory; Decoding; Dictionaries; Encoding; Frequency estimation; Impedance matching; Speech;
Journal_Title :
Proceedings of the IEEE
DOI :
10.1109/PROC.1967.5496