DocumentCode
3427466
Title
An unsupervised web-based topic language model adaptation method
Author
Lecorvé, Gwénolé ; Gravier, Guillaume ; Sébillot, Pascale
Author_Institution
IRISA, Rennes
fYear
2008
fDate
March 31 2008-April 4 2008
Firstpage
5081
Lastpage
5084
Abstract
This paper focuses on a solution to better adapt ASR systems, whose language models (LM) are usually trained on topic-independent corpora, to new topics, in particular in the case of broadcast news. We propose a new complete and fully unsupervised technique that selects keywords from each segment using information retrieval methods, to build a thematically coherent adaptation corpus from the Internet. The LM used for the initial transcription is then adapted before rescoring word lattices. Experimental results demonstrate the validity of the proposed adaptation technique with a significant reduction of the perplexity after LM adaptation. Word error rates are also improved in some cases though to a lesser extent.
Keywords
Internet; natural language processing; query processing; speech recognition; information retrieval; natural languages; rescoring word lattices; speech recognition; topic-independent corpora; unsupervised Web-based topic language model; word error rates; Adaptation model; Automatic speech recognition; Broadcasting; Error analysis; Information retrieval; Internet; Lattices; Natural languages; Streaming media; Vocabulary; Internet; Speech recognition; natural languages;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location
Las Vegas, NV
ISSN
1520-6149
Print_ISBN
978-1-4244-1483-3
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2008.4518801
Filename
4518801
Link To Document