DocumentCode :
2659976
Title :
Continuous topic language modeling for speech recognition
Author :
Chueh, Chuang-Hua ; Chien, Jen-Tzung
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
193
Lastpage :
196
Abstract :
Continuous representation of word sequence can effectively solve data sparseness problem in n-gram language model, where the discrete variables of words are represented and the unseen events are prone to happen. This problem is increasingly severe when extracting long-distance regularities for high-order n-gram model. Rather than considering discrete word space, we construct the continuous space of word sequence where the latent topic information is extracted. The continuous vector is formed by the topic posterior probabilities and the least-squares projection matrix from discrete word space to continuous topic space is estimated accordingly. The unseen words can be predicted through the new continuous latent topic language model. In the experiments on continuous speech recognition, we obtain significant performance improvement over the conventional topic-based language model.
Keywords :
least mean squares methods; matrix algebra; natural language processing; probability; speech recognition; continuous topic language modeling; continuous vector; data sparseness problem; least-squares projection matrix; n-gram language model; speech recognition; topic posterior probability; Data mining; History; Multi-layer neural network; Natural languages; Neural networks; Predictive models; Probability distribution; Smoothing methods; Speech recognition; Vocabulary; Smoothing methods; clustering methods; natural languages; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Conference_Location :
Goa
Print_ISBN :
978-1-4244-3471-8
Electronic_ISBN :
978-1-4244-3472-5
Type :
conf
DOI :
10.1109/SLT.2008.4777873
Filename :
4777873
Link To Document :
بازگشت