DocumentCode :
290056
Title :
Ergodic hidden Markov models and polygrams for language modeling
Author :
Kuhn, T. ; Niemann, H. ; Schukat-Talamazzini, E.G.
Author_Institution :
Erlangen-Nurnberg Univ., Germany
Volume :
i
fYear :
1994
fDate :
19-22 Apr 1994
Abstract :
Presents two new techniques for language modeling in speech recognition. The first technique is based on ergodic discrete density hidden Markov models (HMM) which can be applied to bigrams based on word categories. This statistical approach of the so-called Markov bigrams enables an efficient unsupervised learning procedure for the bigram probabilities with the well-known Baum-Welch algorithm. Furthermore, maximizing the model-conditional probability is equivalent to minimizing the perplexity of the training corpus. The second technique is based on polygrams which are an extension of the bigram (n=2) or trigram (n=3) grammars to any possible value of n. According to the smoothing techniques for bigram or trigram models, the probabilities of the n-grams in the polygram model are interpolated using the relative frequencies of all n´-grams with n´⩽n. Both techniques were evaluated on the ATIS corpus by computing the test set perplexity. Furthermore the authors integrated the Markov bigrams as well as the polygrams into their word recognizer for continuous speech. Experimental results on a German database are discussed using the N-best paradigm to reorder the generated word sequences according to the sentence probability of the language model
Keywords :
computational linguistics; grammars; hidden Markov models; natural languages; probability; speech recognition; unsupervised learning; ATIS corpus; Baum-Welch algorithm; German database; N-best paradigm; bigrams; ergodic discrete density hidden Markov models; grammars; interpolation; language modeling; model-conditional probability; n-grams; perplexity; polygrams; sentence probability; smoothing techniques; speech recognition; statistical approach; training corpus; trigram; unsupervised learning procedure; word categories; word recognizer; Databases; Frequency estimation; Hidden Markov models; History; Natural languages; Probability; Robustness; Smoothing methods; Speech recognition; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1994. ICASSP-94., 1994 IEEE International Conference on
Conference_Location :
Adelaide, SA
ISSN :
1520-6149
Print_ISBN :
0-7803-1775-0
Type :
conf
DOI :
10.1109/ICASSP.1994.389282
Filename :
389282
Link To Document :
بازگشت