Title :
An improved maximum entropy language model
Author :
Gaol, Wen ; Wen Gao
Author_Institution :
Dept. of Comput. Sci. & Eng., Harbin Inst. of Technol., China
Abstract :
An improved maximum entropy language model (IMELM) is presented based on three respects of language modeling (LM) improvement: the solution of long dependences, the integration of language knowledge into LM, and the general framework that combines all kinds of language knowledge. The proposed model combines trigram with base phrase structure knowledge in this paper. Trigram is used to capture the local relation between words, while base phrase structure knowledge is considered to represent the long-distance relations between syntactical structures. The knowledge of syntax, semantics and word is integrated in the maximum entropy framework. The experimental results show that the proposed model has a 24% improvement in perplexity over the conventional trigram model.
Keywords :
feature extraction; grammars; maximum entropy methods; natural languages; speech processing; Chinese grammatical characteristics; IMELM; base phrase structure knowledge; feature selection; improved maximum entropy language mode; language knowledge; language modeling; long-distance relations; model training; perplexity; syntactical structures; trigram model; word segmentation; Clustering algorithms; Computational complexity; Computer science; Entropy; Handicapped aids; Handwriting recognition; Knowledge engineering; Natural languages; Speech recognition; Training data;
Conference_Titel :
Signal Processing, 2002 6th International Conference on
Print_ISBN :
0-7803-7488-6
DOI :
10.1109/ICOSP.2002.1179977