Title :
Refine bigram PLSA model by assigning latent topics unevenly
Author :
Nie, Jiazhong ; Li, Runxin ; Luo, Dingsheng ; Wu, Xihong
Author_Institution :
Peking Univ., Beijing
Abstract :
As an important component in many speech and language processing applications, statistical language model has been widely investigated. The bigram topic model, which combines advantages of both the traditional n-gram model and the topic model, turns out to be a promising language modeling approach. However, the original bigram topic model assigns the same topic number for each context word but ignores the fact that there are different complexities to the latent semantics of context words, we present a new bigram topic model, the bigram PLSA model, and propose a modified training strategy that unevenly assigns latent topics to context words according to an estimation of their latent semantic complexities. As a consequence, a refined bigram PLSA model is reached. Experiments on HUB4 Mandarin test transcriptions reveal the superiority over existing models and further performance improvements on perplexity are achieved through the use of the refined bigram PLSA model.
Keywords :
computational linguistics; matrix decomposition; natural language processing; probability; speech processing; unsupervised learning; word processing; bigram PLSA model; bigram topic model; context words; language processing applications; matrix decomposition; n-gram model; probabilistic latent semantic analysis; speech processing applications; statistical language model; training strategy; uneven latent topic assignment; Auditory system; Context modeling; Laboratories; Linear discriminant analysis; Natural languages; Probability; Speech processing; Speech recognition; Tagging; Testing; PLSA; bigram topic model; language model; latent sematic;
Conference_Titel :
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-1746-9
Electronic_ISBN :
978-1-4244-1746-9
DOI :
10.1109/ASRU.2007.4430099