DocumentCode
2659976
Title
Continuous topic language modeling for speech recognition
Author
Chueh, Chuang-Hua ; Chien, Jen-Tzung
Author_Institution
Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan
fYear
2008
fDate
15-19 Dec. 2008
Firstpage
193
Lastpage
196
Abstract
Continuous representation of word sequence can effectively solve data sparseness problem in n-gram language model, where the discrete variables of words are represented and the unseen events are prone to happen. This problem is increasingly severe when extracting long-distance regularities for high-order n-gram model. Rather than considering discrete word space, we construct the continuous space of word sequence where the latent topic information is extracted. The continuous vector is formed by the topic posterior probabilities and the least-squares projection matrix from discrete word space to continuous topic space is estimated accordingly. The unseen words can be predicted through the new continuous latent topic language model. In the experiments on continuous speech recognition, we obtain significant performance improvement over the conventional topic-based language model.
Keywords
least mean squares methods; matrix algebra; natural language processing; probability; speech recognition; continuous topic language modeling; continuous vector; data sparseness problem; least-squares projection matrix; n-gram language model; speech recognition; topic posterior probability; Data mining; History; Multi-layer neural network; Natural languages; Neural networks; Predictive models; Probability distribution; Smoothing methods; Speech recognition; Vocabulary; Smoothing methods; clustering methods; natural languages; speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Conference_Location
Goa
Print_ISBN
978-1-4244-3471-8
Electronic_ISBN
978-1-4244-3472-5
Type
conf
DOI
10.1109/SLT.2008.4777873
Filename
4777873
Link To Document