• DocumentCode
    336823
  • Title

    Improved topic-dependent language modeling using information retrieval techniques

  • Author

    Mahajan, Milind ; Beeferman, Doug ; Huang, X.D.

  • Author_Institution
    Microsoft Corp., Redmond, WA, USA
  • Volume
    1
  • fYear
    1999
  • fDate
    15-19 Mar 1999
  • Firstpage
    541
  • Abstract
    N-gram language models are frequently used by the speech recognition systems to constrain and guide the search. N-gram models use only the last N-1 words to predict the next word. Typical values of N that are used range from 2-4. N-gram language models thus lack the long-term context information. We show that the predictive power of the N-gram language models can be improved by using long-term context information about the topic of discussion. We use information retrieval techniques to generalize the available context information for topic-dependent language modeling. We demonstrate the effectiveness of this technique by performing experiments on the Wall Street Journal text corpus, which is a relatively difficult task for topic-dependent language modeling since the text is relatively homogeneous. The proposed method can reduce the perplexity of the baseline language model by 37%, indicating the predictive power of the topic-dependent language model
  • Keywords
    grammars; information retrieval; natural languages; speech recognition; N-gram language models; Wall Street Journal text corpus; baseline language model; experiments; information retrieval techniques; long-term context information; model perplexity prediction; probability; speech recognition systems; topic-dependent language modeling; Context modeling; Entropy; History; Information retrieval; Natural languages; Power system modeling; Predictive models; Speech recognition; Training data; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on
  • Conference_Location
    Phoenix, AZ
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-5041-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.1999.758182
  • Filename
    758182