Title :
Estimation of probabilities in the language model of the IBM speech recognition system
Author_Institution :
IBM T.J. Watson Research Center, Yorktown Heights, NY
fDate :
8/1/1984 12:00:00 AM
Abstract :
The language model probabilities are estimated by an empirical Bayes approach in which a prior distribution for the unknown probabilities is itself estimated through a novel choice of data. The predictive power of the model thus fitted is compared by means of its experimental perplexity [1] to the model as fitted by the Jelinek-Mercer deleted estimator and as fitted by the Turing-Good formulas for probabilities of unseen or rarely seen events.
Keywords :
Bayesian methods; Cities and towns; Helium; Natural languages; Power system modeling; Predictive models; Probability; Smoothing methods; Speech recognition; Vocabulary;
Journal_Title :
Acoustics, Speech and Signal Processing, IEEE Transactions on
DOI :
10.1109/TASSP.1984.1164378