DocumentCode :
2399944
Title :
Experiments on the zero frequency problem
Author :
Cleary, John G. ; Teahan, W.J.
Author_Institution :
Dept. of Comput. Sci., Waikato Univ., Hamilton, New Zealand
fYear :
1995
fDate :
28-30 Mar 1995
Firstpage :
480
Abstract :
Summary form only given. A fundamental problem in the construction of statistical techniques for data compression of sequential text is the generation of probabilities from counts of previous occurrences. Each context used in the statistical model accumulates counts of the number of times each symbol has occurred in that context. So in a binary alphabet there will be two counts C0 and C1 (the number of times a 0 or 1 has occurred). The problem then is to take the counts and generate from them a probability that the next character will be a 0 or 1. A naive estimate of the probability of character i could be obtained by the ratio pi=Ci/(C0+Ci ). A fundamental problem with this is that it will generate a zero probability if C0 or C1 is zero. Unfortunately, a zero probability prevents coding from working correctly as the “optimum” code length in this case is infinite. Consequently any estimate of the probabilities must be non-zero even in the presence of zero counts. This problem is called the zero frequency problem . A well known solution to the problem was formulated by Laplace and is known as Laplace´s law of succession. We have investigated the correctness of Laplace´s law by experiment
Keywords :
Laplace equations; data compression; encoding; probability; statistical analysis; word processing; Laplace´s law of succession; binary alphabet; code length; data compression; experiments; probabilities; sequential text; statistical model; statistical techniques; zero frequency problem; zero probability; Character generation; Computer science; Context modeling; Displays; Frequency estimation; Probability; State estimation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 1995. DCC '95. Proceedings
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
0-8186-7012-6
Type :
conf
DOI :
10.1109/DCC.1995.515590
Filename :
515590
Link To Document :
بازگشت