مرکز منطقه ای اطلاع رساني علوم و فناوري - Unsupervised language model adaptation using n-gram weighting

DocumentCode :

3455510

Title :

Unsupervised language model adaptation using n-gram weighting

Author :

Haidar, Md Akmal ; O´Shaughnessy, D.

Author_Institution :

INRS-EMT, Montreal, QC, Canada

fYear :

2011

fDate :

8-11 May 2011

Abstract :

In this paper, we introduce the weighting of topic models in mixture language model adaptation using n-grams of the topic models. Topic clusters are formed by using a hard-clustering method assigning one topic to one document based on the maximum number of words chosen from a topic for that document in Latent Dirichlet Allocation (LDA) analysis. The n-grams of the topic generated by hard-clustering are used to compute the mixture weights of the component topic models. Instead of using all the words of the training vocabulary, selected words are used for LDA analysis, which are chosen by incorporating some information retrieval techniques. The proposed n-gram weighting approach shows significant reduction in perplexity and word error rate (WER) against a unigram weighting approach used in the literature.

Keywords :

information retrieval; pattern clustering; statistical analysis; text analysis; unsupervised learning; LDA analysis; component topic models; hard clustering method; information retrieval; latent Dirichlet allocation analysis; mixture language model adaptation; mixture weights; n-gram weighting approach; training vocabulary; unsupervised language model adaptation; word error rate; Adaptation models; Computational modeling; Mathematical model; Probabilistic logic; Semantics; Training; Vocabulary; Mixture models; language model adaptation; latent Dirichlet allocation; speech recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Electrical and Computer Engineering (CCECE), 2011 24th Canadian Conference on

Conference_Location :

Niagara Falls, ON

ISSN :

0840-7789

Print_ISBN :

978-1-4244-9788-1

Electronic_ISBN :

0840-7789

Type :

conf

DOI :

10.1109/CCECE.2011.6030578

Filename :

6030578

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3455510