DocumentCode
353702
Title
Putting it all together: language model combination
Author
Goodman, Joshua T.
Author_Institution
Speech Technol. Group, Microsoft Corp., Redmond, WA, USA
Volume
3
fYear
2000
fDate
2000
Firstpage
1647
Abstract
In the past several years, a number of different language modeling improvements over simple trigram models have been found, including caching, higher-order n-grams, skipping, modified Kneser-Ney smoothing and clustering. While all of these techniques have been studied separately, they have rarely been studied in combination. We find some significant interactions, especially with smoothing techniques. The combination of all techniques leads to up to a 45% perplexity reduction over a Katz (1987) smoothed trigram model with no count cutoffs, the highest such perplexity reduction reported
Keywords
linguistics; natural languages; nomograms; pattern clustering; smoothing methods; speech processing; caching; clustering; count cutoffs; higher-order n-grams; language model combination; modified Kneser-Ney smoothing; perplexity reduction; skipping; smoothing techniques; trigram models; History; Interpolation; Smoothing methods; Speech recognition; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location
Istanbul
ISSN
1520-6149
Print_ISBN
0-7803-6293-4
Type
conf
DOI
10.1109/ICASSP.2000.862064
Filename
862064
Link To Document