DocumentCode
888197
Title
Printed english compression by dictionary encoding
Author
White, H.E.
Author_Institution
RCA Laboratories, Princeton, N. J.
Volume
55
Issue
3
fYear
1967
fDate
3/1/1967 12:00:00 AM
Firstpage
390
Lastpage
396
Abstract
The ability of a dictionary encoder to reduce the redundancy of printed English text is evaluated by simulation on a general-purpose digital computer. The dictionary encoder matches segments of the input text to entries of a stored dictionary which contains frequently occurring sequences of letters. The text is thus defined as the succession of code designations corresponding to the selected dictionary entries. Since, for a normal piece of text, fewer binary digits are needed to specify the code designations than the text itself, the encoding produces a compressed equivalent of the original input. In addition to evaluating encoder performance the simulator also collects language statistics which are used for optimization of the encoder logic and the dictionary entries. For a broad type of English language text (news dispatches prepared for newspaper publication) the number of binary digits required to represent a piece of text can be reduced by 50 percent when using a 1000-entry dictionary. While a better compression than 50 percent is theoretically possible it may be difficult to realize, but a compression of the input text to 60 to 70 percent of its original size appears to be easily realizable with a small dictionary.
Keywords
Chromium; Computational modeling; Computer simulation; Constraint theory; Decoding; Dictionaries; Encoding; Frequency estimation; Impedance matching; Speech;
fLanguage
English
Journal_Title
Proceedings of the IEEE
Publisher
ieee
ISSN
0018-9219
Type
jour
DOI
10.1109/PROC.1967.5496
Filename
1447426
Link To Document