DocumentCode :
2704423
Title :
On Compressing N-Gram Language Models
Author :
Hirsimaki, T.
Author_Institution :
Adaptive Inf. Res. Centre, Helsinki Univ. of Technol., Espoo, Finland
Volume :
4
fYear :
2007
fDate :
15-20 April 2007
Abstract :
In large-vocabulary speech recognition systems, the major part of memory resources is typically consumed by a large n-gram language model. Representing the language model compactly is important in recognition systems targeted for small devices with limited memory resources. This paper extends the compressed language model structure proposed earlier by Whittaker and Raj. By separating n-grams that are prefixes to longer n-grams, redundant information can be omitted. Experiments on English 4-gram models and Finnish 6-gram models show that extended structure can achieve up to 30% lossless memory reductions when compared to baseline structure of Whittaker and Raj.
Keywords :
data compression; natural language processing; speech coding; speech recognition; English 4-gram models; Finnish 6-gram models; compressing n-gram language models; large-vocabulary speech recognition systems; Data compression; Data structures; Entropy; Informatics; Natural languages; Speech recognition; Target recognition; Text recognition; Tree data structures; Vocabulary; Data compression; Data structures; Modeling; Natural languages; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
ISSN :
1520-6149
Print_ISBN :
1-4244-0727-3
Type :
conf
DOI :
10.1109/ICASSP.2007.367228
Filename :
4218259
Link To Document :
بازگشت