DocumentCode :
3180017
Title :
An improved maximum entropy language model
Author :
Gaol, Wen ; Wen Gao
Author_Institution :
Dept. of Comput. Sci. & Eng., Harbin Inst. of Technol., China
Volume :
2
fYear :
2002
fDate :
26-30 Aug. 2002
Firstpage :
1083
Abstract :
An improved maximum entropy language model (IMELM) is presented based on three respects of language modeling (LM) improvement: the solution of long dependences, the integration of language knowledge into LM, and the general framework that combines all kinds of language knowledge. The proposed model combines trigram with base phrase structure knowledge in this paper. Trigram is used to capture the local relation between words, while base phrase structure knowledge is considered to represent the long-distance relations between syntactical structures. The knowledge of syntax, semantics and word is integrated in the maximum entropy framework. The experimental results show that the proposed model has a 24% improvement in perplexity over the conventional trigram model.
Keywords :
feature extraction; grammars; maximum entropy methods; natural languages; speech processing; Chinese grammatical characteristics; IMELM; base phrase structure knowledge; feature selection; improved maximum entropy language mode; language knowledge; language modeling; long-distance relations; model training; perplexity; syntactical structures; trigram model; word segmentation; Clustering algorithms; Computational complexity; Computer science; Entropy; Handicapped aids; Handwriting recognition; Knowledge engineering; Natural languages; Speech recognition; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing, 2002 6th International Conference on
Print_ISBN :
0-7803-7488-6
Type :
conf
DOI :
10.1109/ICOSP.2002.1179977
Filename :
1179977
Link To Document :
بازگشت