Title :
A novel interpolated N-gram language model based on class hierarchy
Author :
Lv, Zhenyu ; Liu, Wenju ; Yang, Zhanlei
Author_Institution :
Inst. of Autom., Chinese Acad. of Sci., Beijing, China
Abstract :
In this paper, we propose a novel interpolated language model that combines the interpolation and the backing-off along hierarchical classes based on class hierarchy. And the corresponding approach to the estimation of interpolation coefficients is also presented. We use the Minimum Discriminative Information (MDI) method to cluster the vocabulary into a word-clustering tree hierarchically. The tree is used to balance the generalization ability of classes´ and word specificity when estimating the likelihood of a n-gram event. Experiments are performed on Reuter´s corpus using a vocabulary of 27,000 words. Results show a reduction on the test perplexity over the standard Modified KN n-gram approach by 12%.
Keywords :
generalisation (artificial intelligence); interpolation; pattern clustering; speech recognition; trees (mathematics); class hierarchy; generalization ability; interpolated N-gram language model; interpolation coefficient estimation; minimum discriminative information method; n-gram event likelihood estimation; speech recognition; vocabulary clustering; word-clustering tree; Automation; Clustering algorithms; Frequency estimation; Interpolation; Natural languages; Predictive models; Smoothing methods; Speech recognition; Testing; Vocabulary; Language model; back-off; class hierarchy; cluster; interpolate;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-4538-7
Electronic_ISBN :
978-1-4244-4540-0
DOI :
10.1109/NLPKE.2009.5313739