DocumentCode
542298
Title
Language model adaptation through topic decomposition and MDI estimation
Author
Federico, Marcello
Author_Institution
ITC-irst - Centro per la Ricerca Scientifica e Tecnologica, I-38050 Povo di Trento, Italy
Volume
1
fYear
2002
fDate
13-17 May 2002
Abstract
This work presents a language model adaptation method combining the latent semantic analysis framework with the minimum discrimination information estimation criterion. In particular, an unsupervised topic model decomposition is built which allows to infer topic related word distributions from very short adaptation texts. The resulting word distribution is then used to constraint the estimation of a minimum divergence trigram language. With respect to previous work, implementation details are discussed that make such approach effective for a large scale application. Experimental results are provided for a digital library indexing task, i.e. the speech transcription of five historical documentary films. By adapting a trigram language model from very terse content descriptions, i.e. maximum ten words, available for each film, a word error rate relative reduction of 3.2% was achieved.
Keywords
Cities and towns; Erbium; Films; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
Conference_Location
Orlando, FL, USA
ISSN
1520-6149
Print_ISBN
0-7803-7402-9
Type
conf
DOI
10.1109/ICASSP.2002.5743832
Filename
5743832
Link To Document