DocumentCode
2839386
Title
A maximum entropy approach for integrating semantic information in statistical language models
Author
Chueh, Chuang-Hua ; Chien, Jen-Tzung ; Wang, Hsin-Min
Author_Institution
Dept. of Comput. Sci. & Inf. Eng., Cheng Kung Univ., Tainan, Taiwan
fYear
2004
fDate
15-18 Dec. 2004
Firstpage
309
Lastpage
312
Abstract
In this paper, we propose an adaptive statistical language model, which successfully incorporates the semantic information into an n-gram model. Traditional n-gram models exploit only the immediate context of history. We first introduce the semantic topic as a new source to extract the long distance information for language modeling, and then adopt the maximum entropy (ME) approach instead of the conventional linear interpolation method to integrate the semantic information with the n-gram model. Using the ME approach, each information source gives rise to a set of constraints, which should be satisfied to achieve the hybrid model. In the experiments, the ME language models, trained using the China Times newswire corpus, achieved 40% perplexity reduction over the baseline bigram model.
Keywords
linguistics; maximum entropy methods; natural languages; ME language models; adaptive statistical language model; information source constraints; long distance information extraction; maximum entropy method; n-gram model; natural language regularities; perplexity reduction; semantic information; Automatic speech recognition; Computer science; Context modeling; Data mining; Electronic mail; Entropy; History; Information science; Interpolation; Natural languages;
fLanguage
English
Publisher
ieee
Conference_Titel
Chinese Spoken Language Processing, 2004 International Symposium on
Print_ISBN
0-7803-8678-7
Type
conf
DOI
10.1109/CHINSL.2004.1409648
Filename
1409648
Link To Document