DocumentCode :
336824
Title :
Smoothing methods in maximum entropy language modeling
Author :
Martin, S.C. ; Ney, H. ; Zaplo, J.
Author_Institution :
Lehrstuhl fur Inf., Tech. Hochschule Aachen, Germany
Volume :
1
fYear :
1999
fDate :
15-19 Mar 1999
Firstpage :
545
Abstract :
This paper discusses various aspects of smoothing techniques in maximum entropy language modeling, a topic not sufficiently covered by previous publications. We show: (1) that straightforward maximum entropy models with nested features, e.g. tri-, bi-, and unigrams, result in unsmoothed relative frequencies models; (2) that maximum entropy models with nested features and discounted feature counts approximate backing-off smoothed relative frequencies models with Kneser´s (see IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Detroit, MI, vol. I, p.181-84, 1995) advanced marginal backoff distribution; this explains some of the reported success of maximum entropy models in the past; and (3) perplexity results for nested and non-nested features, e.g, trigrams and distance-trigrams, on a 4-million word subset of the Wall Street Journal corpus, showing that the smoothing method has more effect on the perplexity than the method used to combine information
Keywords :
grammars; maximum entropy methods; natural languages; smoothing methods; speech processing; Wall Street Journal corpus; bigrams; discounted feature; distance-trigrams; marginal backoff distribution; maximum entropy language modeling; nested features; nonnested features; perplexity results; smoothing methods; trigrams; unigrams; unsmoothed relative frequencies models; word subset; Entropy; Equations; Frequency; Geographic Information Systems; History; Iterative algorithms; Parameter estimation; Smoothing methods; Standards publication;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on
Conference_Location :
Phoenix, AZ
ISSN :
1520-6149
Print_ISBN :
0-7803-5041-3
Type :
conf
DOI :
10.1109/ICASSP.1999.758183
Filename :
758183
Link To Document :
بازگشت