• DocumentCode
    2659976
  • Title

    Continuous topic language modeling for speech recognition

  • Author

    Chueh, Chuang-Hua ; Chien, Jen-Tzung

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    193
  • Lastpage
    196
  • Abstract
    Continuous representation of word sequence can effectively solve data sparseness problem in n-gram language model, where the discrete variables of words are represented and the unseen events are prone to happen. This problem is increasingly severe when extracting long-distance regularities for high-order n-gram model. Rather than considering discrete word space, we construct the continuous space of word sequence where the latent topic information is extracted. The continuous vector is formed by the topic posterior probabilities and the least-squares projection matrix from discrete word space to continuous topic space is estimated accordingly. The unseen words can be predicted through the new continuous latent topic language model. In the experiments on continuous speech recognition, we obtain significant performance improvement over the conventional topic-based language model.
  • Keywords
    least mean squares methods; matrix algebra; natural language processing; probability; speech recognition; continuous topic language modeling; continuous vector; data sparseness problem; least-squares projection matrix; n-gram language model; speech recognition; topic posterior probability; Data mining; History; Multi-layer neural network; Natural languages; Neural networks; Predictive models; Probability distribution; Smoothing methods; Speech recognition; Vocabulary; Smoothing methods; clustering methods; natural languages; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
  • Conference_Location
    Goa
  • Print_ISBN
    978-1-4244-3471-8
  • Electronic_ISBN
    978-1-4244-3472-5
  • Type

    conf

  • DOI
    10.1109/SLT.2008.4777873
  • Filename
    4777873