DocumentCode
2259050
Title
A novel interpolated N-gram language model based on class hierarchy
Author
Lv, Zhenyu ; Liu, Wenju ; Yang, Zhanlei
Author_Institution
Inst. of Autom., Chinese Acad. of Sci., Beijing, China
fYear
2009
fDate
24-27 Sept. 2009
Firstpage
1
Lastpage
5
Abstract
In this paper, we propose a novel interpolated language model that combines the interpolation and the backing-off along hierarchical classes based on class hierarchy. And the corresponding approach to the estimation of interpolation coefficients is also presented. We use the Minimum Discriminative Information (MDI) method to cluster the vocabulary into a word-clustering tree hierarchically. The tree is used to balance the generalization ability of classes´ and word specificity when estimating the likelihood of a n-gram event. Experiments are performed on Reuter´s corpus using a vocabulary of 27,000 words. Results show a reduction on the test perplexity over the standard Modified KN n-gram approach by 12%.
Keywords
generalisation (artificial intelligence); interpolation; pattern clustering; speech recognition; trees (mathematics); class hierarchy; generalization ability; interpolated N-gram language model; interpolation coefficient estimation; minimum discriminative information method; n-gram event likelihood estimation; speech recognition; vocabulary clustering; word-clustering tree; Automation; Clustering algorithms; Frequency estimation; Interpolation; Natural languages; Predictive models; Smoothing methods; Speech recognition; Testing; Vocabulary; Language model; back-off; class hierarchy; cluster; interpolate;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
Conference_Location
Dalian
Print_ISBN
978-1-4244-4538-7
Electronic_ISBN
978-1-4244-4540-0
Type
conf
DOI
10.1109/NLPKE.2009.5313739
Filename
5313739
Link To Document