DocumentCode :
928809
Title :
A convergent gambling estimate of the entropy of English
Author :
Cover, Thomas M. ; King, Roger C.
Volume :
24
Issue :
4
fYear :
1978
fDate :
7/1/1978 12:00:00 AM
Firstpage :
413
Lastpage :
421
Abstract :
In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon´s technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text. IfS_{n}denotes the subject´s capital afternbets at27for1odds, and if it is assumed that the subject knows the underlying probability distribution for the processX, then the entropy estimate ishat{H}_{n}(X)=(1-(1/n) log_{27}S_{n}) log_{2} 27bits/symbol. If the subject does not know the true probability distribution for the stochastic process, thenhat{H}_{n}(X)is an asymptotic upper bound for the true entropy. IfXis stationary,Ehat{H}_{n}(X)rightarrowH(X), H(X)being the true entropy of the process. Moreover, if X is ergodic, then by the Shannon-McMillan-Breiman theoremhat{H}_{n}(X)rightarrowH(X)with probability one. Preliminary indications are that English text has an entropy of approximately1.3bits/symbol, which agrees well with Shannon´s estimate. In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon´s technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text. IfS_{n}denotes the subject´s capital afternbets at27for1odds, and if it is assumed that the subject knows the underlying probability distribution for the processX, then the entropy estimate ishat{H}_{n}(X)=(1-(1/n) log_{27}S_{n}) log_{2} 27bits/symbol. If the subject does not know the true probability distribution for the stochastic process, thenhat{H}_{n}(X)is an asymptotic upper - bound for the true entropy. IfXis stationary,Ehat{H}_{n}(X)rightarrowH(X), H(X)being the true entropy of the process. Moreover, if X is ergodic, then by the Shannon-McMillan-Breiman theoremhat{H}_{n}(X)rightarrowH(X)with probability one. Preliminary indications are that English text has an entropy of approximately1.3bits/symbol, which agrees well with Shannon´s estimate. In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon´s technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text. IfS_{n}denotes the subject´s capital afternbets at27for1odds, and if it is assumed that the subject knows the underlying probability distribution for the processX, then the entropy estimate ishat{H}_{n}(X)=(1-(1/n) log_{27}S_{n}) log_{2} 27bits/symbol. If the subject does not know the true probability distribution for the stochastic process, thenhat{H}_{n}(X)is an asymptotic upper bound for the true entropy. IfXis stationary,Ehat{H}_{n}(X)rightarrowH(X), H(X)being the true entropy of the process.Moreover, if X is ergodic, then by the Shannon-McMillan-Breiman theoremhat{H}_{n}(X)rightarrowH(X)with probability one. Preliminary indications are that English text has an entropy of a
Keywords :
Bibliographies; Entropy functions; Languages; Character generation; Entropy; Helium; Laboratories; Random processes; Random variables; Statistics; Stochastic processes; Upper bound; Yield estimation;
fLanguage :
English
Journal_Title :
Information Theory, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9448
Type :
jour
DOI :
10.1109/TIT.1978.1055912
Filename :
1055912
Link To Document :
بازگشت