• DocumentCode
    2259050
  • Title

    A novel interpolated N-gram language model based on class hierarchy

  • Author

    Lv, Zhenyu ; Liu, Wenju ; Yang, Zhanlei

  • Author_Institution
    Inst. of Autom., Chinese Acad. of Sci., Beijing, China
  • fYear
    2009
  • fDate
    24-27 Sept. 2009
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    In this paper, we propose a novel interpolated language model that combines the interpolation and the backing-off along hierarchical classes based on class hierarchy. And the corresponding approach to the estimation of interpolation coefficients is also presented. We use the Minimum Discriminative Information (MDI) method to cluster the vocabulary into a word-clustering tree hierarchically. The tree is used to balance the generalization ability of classes´ and word specificity when estimating the likelihood of a n-gram event. Experiments are performed on Reuter´s corpus using a vocabulary of 27,000 words. Results show a reduction on the test perplexity over the standard Modified KN n-gram approach by 12%.
  • Keywords
    generalisation (artificial intelligence); interpolation; pattern clustering; speech recognition; trees (mathematics); class hierarchy; generalization ability; interpolated N-gram language model; interpolation coefficient estimation; minimum discriminative information method; n-gram event likelihood estimation; speech recognition; vocabulary clustering; word-clustering tree; Automation; Clustering algorithms; Frequency estimation; Interpolation; Natural languages; Predictive models; Smoothing methods; Speech recognition; Testing; Vocabulary; Language model; back-off; class hierarchy; cluster; interpolate;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
  • Conference_Location
    Dalian
  • Print_ISBN
    978-1-4244-4538-7
  • Electronic_ISBN
    978-1-4244-4540-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2009.5313739
  • Filename
    5313739