• DocumentCode
    730833
  • Title

    Document-specific context plsa language model for speech recognition

  • Author

    Haidar, Md Akmal ; O´Shaughnessy, Douglas

  • Author_Institution
    INRS-EMT, Montreal, QC, Canada
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    5326
  • Lastpage
    5330
  • Abstract
    In this paper, we introduce a document-specific context probabilistic latent semantic analysis (DCPLSA) model for speech recognition. This is an extension of a CPLSA model [1] where the probability of word is conditioned only on topics. The CPLSA model uses the bigram counts that are the number of appearances of the bigrams in the corpus. These counts are the sum of the bigram counts in different documents where they could appear to describe different topics. We encounter this problem in the CPLSA model and introduce the document-specific CPLSA model (DCPLSA) where the probability of a word is conditioned on both topic and document. We carried out experiments on a continuous speech recognition (CSR) task using the Wall Street Journal (WSJ) corpus and have seen that the proposed DCPLSA approach yields significant reduction in both perplexity and word error rate (WER) measurements over the other approaches used in the literature.
  • Keywords
    probability; speech recognition; Wall Street Journal corpus; bigram counts; continuous speech recognition; document-specific context PLSA language model; document-specific context probabilistic latent semantic analysis model; word error rate; Adaptation models; Computational modeling; Context; Context modeling; Mathematical model; Speech recognition; Training; Topic models; bigram PLSA models; context-based PLSA language model; speech recognition; statistical language model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178988
  • Filename
    7178988