DocumentCode :
2788409
Title :
Semi-supervised learning of language model using unsupervised topic model
Author :
Bai, Shuanhu ; Huang, Chien-Lin ; Ma, Bin ; Li, Haizhou
Author_Institution :
Inst. for Infocomm Res., Singapore, Singapore
fYear :
2010
fDate :
14-19 March 2010
Firstpage :
5386
Lastpage :
5389
Abstract :
We present a semi-supervised learning (SSL) method for building domain-specific language models (LMs) from general-domain data using probabilistic latent semantic analysis (PLSA). The proposed technique first performs topic decomposition (TD) on the combined dataset of domain-specific and general-domain data. Then it derives latent topic distribution of the interested domain, and derives domain-specific word n-gram counts with a PLSA style mixture model. Finally, it uses traditional n-gram modeling to construct domain-specific LMs from the domain-specific word n-gram counts. Experimental results show that this technique outperforms both states-of-the-art relative entropy text selection and traditional supervised training methods.
Keywords :
learning (artificial intelligence); natural language processing; statistical analysis; PLSA style mixture model; domain-specific language models; domain-specific word n-gram counts; language model learning; probabilistic latent semantic analysis; relative entropy text selection; semi-supervised learning; topic decomposition; unsupervised topic model; Bridges; Buildings; Computer science; Domain specific languages; Entropy; Joining processes; Learning systems; Semisupervised learning; Statistical distributions; Statistics; language model; semi-supervised learning; topic model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
ISSN :
1520-6149
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2010.5494940
Filename :
5494940
Link To Document :
بازگشت