• DocumentCode
    149616
  • Title

    Novel topic n-gram count LM incorporating document-based topic distributions and n-gram counts

  • Author

    Haidar, Md Akmal ; O´Shaughnessy, D.

  • Author_Institution
    INRS-EMT, Montreal, QC, Canada
  • fYear
    2014
  • fDate
    1-5 Sept. 2014
  • Firstpage
    2310
  • Lastpage
    2314
  • Abstract
    In this paper, we introduce a novel topic n-gram count language model (NTNCLM) using topic probabilities of training documents and document-based n-gram counts. The topic probabilities for the documents are computed by averaging the topic probabilities of words seen in the documents. The topic probabilities of documents are multiplied by the document-based n-gram counts. The products are then summed-up for all the training documents. The results are used as the counts of the respective topics to create the NTNCLMs. The NTNCLMs are adapted by using the topic probabilities of a development test set that are computed as above. We compare our approach with a recently proposed TNCLM [1], where the long-range information outside of the n-gram events is not encountered. Our approach yields significant perplexity and word error rate (WER) reductions over the other approach using the Wall Street Journal (WSJ) corpus.
  • Keywords
    document handling; natural language processing; speech processing; NTNCLM; WER reductions; WSJ corpus; Wall Street Journal; document-based n-gram counts; document-based topic distributions; long-range information; topic n-gram count LM; topic n-gram count language model; topic probabilities; training documents; word error rate; Adaptation models; Computational modeling; Interpolation; Mathematical model; Semantics; Speech recognition; Training; Statistical n-gram language model; mixture models; speech recognition; topic models;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European
  • Conference_Location
    Lisbon
  • Type

    conf

  • Filename
    6952842