DocumentCode :
888197
Title :
Printed english compression by dictionary encoding
Author :
White, H.E.
Author_Institution :
RCA Laboratories, Princeton, N. J.
Volume :
55
Issue :
3
fYear :
1967
fDate :
3/1/1967 12:00:00 AM
Firstpage :
390
Lastpage :
396
Abstract :
The ability of a dictionary encoder to reduce the redundancy of printed English text is evaluated by simulation on a general-purpose digital computer. The dictionary encoder matches segments of the input text to entries of a stored dictionary which contains frequently occurring sequences of letters. The text is thus defined as the succession of code designations corresponding to the selected dictionary entries. Since, for a normal piece of text, fewer binary digits are needed to specify the code designations than the text itself, the encoding produces a compressed equivalent of the original input. In addition to evaluating encoder performance the simulator also collects language statistics which are used for optimization of the encoder logic and the dictionary entries. For a broad type of English language text (news dispatches prepared for newspaper publication) the number of binary digits required to represent a piece of text can be reduced by 50 percent when using a 1000-entry dictionary. While a better compression than 50 percent is theoretically possible it may be difficult to realize, but a compression of the input text to 60 to 70 percent of its original size appears to be easily realizable with a small dictionary.
Keywords :
Chromium; Computational modeling; Computer simulation; Constraint theory; Decoding; Dictionaries; Encoding; Frequency estimation; Impedance matching; Speech;
fLanguage :
English
Journal_Title :
Proceedings of the IEEE
Publisher :
ieee
ISSN :
0018-9219
Type :
jour
DOI :
10.1109/PROC.1967.5496
Filename :
1447426
Link To Document :
بازگشت