• DocumentCode
    3485294
  • Title

    Efficient representation and fast look-up of Maximum Entropy language models

  • Author

    Cui, Jia ; Chen, Stanley ; Zhou, Bowen

  • Author_Institution
    IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    2011
  • fDate
    11-15 Dec. 2011
  • Firstpage
    231
  • Lastpage
    236
  • Abstract
    Word class information has long been proven useful in language modeling (LM). However, the improved performance of class-based LMs over word n-gram models generally comes at the cost of increased decoding complexity and model size. In this paper, we propose a modified version of the Maximum Entropy token-based language model of [1] that matches the performance of the best existing class-based models, but which is as fast for decoding as a word n-gram model. In addition, while it is easy to statically combine word n-gram models built on different corpora into a single word n-gram model for fast decoding, it is unknown how to statically combine class-based LMs effectively. Another contribution of this paper is to propose a novel combination method that retains the gain of class-based LMs over word n-gram models. Experimental results on several spoken language translation tasks show that our model performs significantly better than word n-gram models with comparable decoding speed and only a modest increase in model size.
  • Keywords
    decoding; entropy; speech recognition; class-based LM; decoding speed; maximum entropy token-based language model; n-gram models; speech recognition; word class information; Computational modeling; Data models; Decoding; History; Interpolation; Training; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
  • Conference_Location
    Waikoloa, HI
  • Print_ISBN
    978-1-4673-0365-1
  • Electronic_ISBN
    978-1-4673-0366-8
  • Type

    conf

  • DOI
    10.1109/ASRU.2011.6163936
  • Filename
    6163936