DocumentCode :
2768960
Title :
Investigating linguistic knowledge in a maximum entropy token-based language model
Author :
Cui, Jia ; Su, Yi ; Hall, Keith ; Jelinek, Frederick
Author_Institution :
Johns Hopkins Univ., Baltimore
fYear :
2007
fDate :
9-13 Dec. 2007
Firstpage :
171
Lastpage :
176
Abstract :
We present a novel language model capable of incorporating various types of linguistic information as encoded in the form of a token, a (word, label)-tuple. Using tokens as hidden states, our model is effectively a hidden Markov model (HMM) producing sequences of words with trivial output distributions. The transition probabilities, however, are computed using a maximum entropy model to take advantage of potentially overlapping features. We investigated different types of labels with a wide range of linguistic implications. These models outperform Kneser-Ney smoothed n-gram models both in terms of perplexity on standard datasets and in terms of word error rate for a large vocabulary speech recognition system.
Keywords :
hidden Markov models; linguistics; maximum entropy methods; speech recognition; Kneser-Ney smoothed n-gram models; hidden Markov model; linguistic knowledge; maximum entropy token-based language model; speech recognition system; token encoding; Context modeling; Entropy; Error analysis; Hidden Markov models; Natural languages; Predictive models; Speech processing; Speech recognition; Testing; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-1746-9
Electronic_ISBN :
978-1-4244-1746-9
Type :
conf
DOI :
10.1109/ASRU.2007.4430104
Filename :
4430104
Link To Document :
بازگشت