Title :
Techniques for approximating a trigram language model
Author :
Brugnara, Fabio ; Federico, Marcello
Author_Institution :
Inst. per la Ricerca Sci. e Tecnol., Trento, Italy
Abstract :
Several methods are proposed for reducing the size of a trigram language model (LM), which is often the biggest data structure in a continuous speech recognizer, without affecting its performance. The common factor shared by the different approaches is to select only a subset of the available trigrams, trying to identify those trigrams that mostly contribute to the performance of the full trigram LM. The proposed selection criteria apply to trigram contexts, both of length one or two. These criteria rely on information theory concepts, the back-off probabilities estimated by the LM, or on a measure of the phonetic/linguistic uncertainty relative to a given context. Performance of the reduced trigram LMs are compared both in terms of perplexity and recognition accuracy. Results show that all the considered methods perform better than the naive frequency shifting method. In fact, a 50% size reduction is obtained on a shift-1 trigram LM, at the cost of a 5% increase in word error rate. Moreover, the reduced LMs improve by around 15% the word error rate of a bigram LM of the same size
Keywords :
information theory; natural languages; probability; search problems; speech recognition; stochastic processes; word processing; back-off probabilities; bigram LM; continuous speech recognizer; data structure; information theory concepts; naive frequency shifting method; perplexity; phonetic/linguistic uncertainty; recognition accuracy; reduced trigram LMs; selection criteria; shift-1 trigram LM; trigram contexts; trigram language model approximation; word error rate; Counting circuits; Data structures; Dictionaries; Error analysis; Frequency; Information theory; Natural languages; Probability; Speech; Statistics;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607210