• DocumentCode
    2018231
  • Title

    Building topic mixture language models using the document soft classification notion of topic models

  • Author

    Bai, Shuanhu ; Leung, Cheung-Chi ; Huang, Chien-Lin ; Ma, Bin ; Li, Haizhou

  • Author_Institution
    Inst. for Infocomm Res., Singapore, Singapore
  • fYear
    2010
  • fDate
    Nov. 29 2010-Dec. 3 2010
  • Firstpage
    229
  • Lastpage
    232
  • Abstract
    We present a topic mixture language modeling approach making use of the soft classification notion of topic models. Given a text document set, we first perform document soft classification by applying a topic modeling process such as probabilistic latent semantic analyses (PLSA) or latent Dirichlet allocation (LDA) on the dataset. Then we can derive topic-specific n-gram counts from the classified texts. Finally we build topic-specific n-gram language models (LM) from the n-gram counts using traditional n-gram modeling approach. In decoding we perform topic inference from the processing context, and we use unsupervised topic adaptation approach to combine the topic-specific models. Experimental results show that the suggested method outperforms the state-of-the-art topic-model-based unsupervised adaptation approaches.
  • Keywords
    computational linguistics; inference mechanisms; natural language processing; pattern classification; probability; text analysis; document soft classification; latent Dirichlet allocation; n-gram language model; probabilistic latent semantic analyses; topic mixture language model; unsupervised adaptation; Adaptation model; Buildings; Context; Context modeling; Probabilistic logic; Semantics; Training; language model; topic mixture language model (TMLM); unsupervised adaptation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
  • Conference_Location
    Tainan
  • Print_ISBN
    978-1-4244-6244-5
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2010.5684904
  • Filename
    5684904