• DocumentCode
    2839386
  • Title

    A maximum entropy approach for integrating semantic information in statistical language models

  • Author

    Chueh, Chuang-Hua ; Chien, Jen-Tzung ; Wang, Hsin-Min

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Cheng Kung Univ., Tainan, Taiwan
  • fYear
    2004
  • fDate
    15-18 Dec. 2004
  • Firstpage
    309
  • Lastpage
    312
  • Abstract
    In this paper, we propose an adaptive statistical language model, which successfully incorporates the semantic information into an n-gram model. Traditional n-gram models exploit only the immediate context of history. We first introduce the semantic topic as a new source to extract the long distance information for language modeling, and then adopt the maximum entropy (ME) approach instead of the conventional linear interpolation method to integrate the semantic information with the n-gram model. Using the ME approach, each information source gives rise to a set of constraints, which should be satisfied to achieve the hybrid model. In the experiments, the ME language models, trained using the China Times newswire corpus, achieved 40% perplexity reduction over the baseline bigram model.
  • Keywords
    linguistics; maximum entropy methods; natural languages; ME language models; adaptive statistical language model; information source constraints; long distance information extraction; maximum entropy method; n-gram model; natural language regularities; perplexity reduction; semantic information; Automatic speech recognition; Computer science; Context modeling; Data mining; Electronic mail; Entropy; History; Information science; Interpolation; Natural languages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing, 2004 International Symposium on
  • Print_ISBN
    0-7803-8678-7
  • Type

    conf

  • DOI
    10.1109/CHINSL.2004.1409648
  • Filename
    1409648