مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

2704423

Title :

On Compressing N-Gram Language Models

Author :

Hirsimaki, T.

Author_Institution :

Adaptive Inf. Res. Centre, Helsinki Univ. of Technol., Espoo, Finland

Volume :

fYear :

2007

fDate :

15-20 April 2007

Abstract :

In large-vocabulary speech recognition systems, the major part of memory resources is typically consumed by a large n-gram language model. Representing the language model compactly is important in recognition systems targeted for small devices with limited memory resources. This paper extends the compressed language model structure proposed earlier by Whittaker and Raj. By separating n-grams that are prefixes to longer n-grams, redundant information can be omitted. Experiments on English 4-gram models and Finnish 6-gram models show that extended structure can achieve up to 30% lossless memory reductions when compared to baseline structure of Whittaker and Raj.

Keywords :

data compression; natural language processing; speech coding; speech recognition; English 4-gram models; Finnish 6-gram models; compressing n-gram language models; large-vocabulary speech recognition systems; Data compression; Data structures; Entropy; Informatics; Natural languages; Speech recognition; Target recognition; Text recognition; Tree data structures; Vocabulary; Data compression; Data structures; Modeling; Natural languages; Speech recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on

Conference_Location :

Honolulu, HI

ISSN :

1520-6149

Print_ISBN :

1-4244-0727-3

Type :

conf

DOI :

10.1109/ICASSP.2007.367228

Filename :

4218259

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2704423