DocumentCode :
2700472
Title :
Large-Scale Distributed Language Modeling
Author :
Emami, Ali ; Papineni, K. ; Sorensen, Julian
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Volume :
4
fYear :
2007
fDate :
15-20 April 2007
Abstract :
A novel distributed language model that has no constraints on the n-gram order and no practical constraints on vocabulary size is presented. This model is scalable and allows for an arbitrarily large corpus to be queried for statistical estimates. Our distributed model is capable of producing n-gram counts on demand. By using a novel heuristic estimate for the interpolation weights of a linearly interpolated model, it is possible to dynamically compute the language model probabilities. The distributed architecture follows the client-server paradigm and allows for each client to request an arbitrary weighted mixture of the corpus. This allows easy adaptation of the language model to particular test conditions. Experiments using the distributed LM for re-ranking N-best lists of a speech recognition system resulted in considerable improvements in word error rate (WER), while integration with a machine translation decoder resulted in significant improvements in translation quality as measured by the BLEU score.
Keywords :
client-server systems; decoding; interpolation; language translation; natural language processing; speech coding; speech recognition; statistical analysis; client-server paradigm; interpolation weights; language model probabilities; large-scale distributed language modeling; machine translation decoder; n-gram order; re-ranking N-best lists; speech recognition system; word error rate; Automatic speech recognition; Decoding; Error analysis; Large-scale systems; Natural languages; Probability; Speech recognition; Surface-mount technology; Training data; Vocabulary; Client-server systems; Distributed memory systems; Speech recognition; Statistical language modeling; Statistical machine translation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
ISSN :
1520-6149
Print_ISBN :
1-4244-0727-3
Type :
conf
DOI :
10.1109/ICASSP.2007.367157
Filename :
4218031
Link To Document :
بازگشت