Title :
An unsupervised web-based topic language model adaptation method
Author :
Lecorvé, Gwénolé ; Gravier, Guillaume ; Sébillot, Pascale
Author_Institution :
IRISA, Rennes
fDate :
March 31 2008-April 4 2008
Abstract :
This paper focuses on a solution to better adapt ASR systems, whose language models (LM) are usually trained on topic-independent corpora, to new topics, in particular in the case of broadcast news. We propose a new complete and fully unsupervised technique that selects keywords from each segment using information retrieval methods, to build a thematically coherent adaptation corpus from the Internet. The LM used for the initial transcription is then adapted before rescoring word lattices. Experimental results demonstrate the validity of the proposed adaptation technique with a significant reduction of the perplexity after LM adaptation. Word error rates are also improved in some cases though to a lesser extent.
Keywords :
Internet; natural language processing; query processing; speech recognition; information retrieval; natural languages; rescoring word lattices; speech recognition; topic-independent corpora; unsupervised Web-based topic language model; word error rates; Adaptation model; Automatic speech recognition; Broadcasting; Error analysis; Information retrieval; Internet; Lattices; Natural languages; Streaming media; Vocabulary; Internet; Speech recognition; natural languages;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2008.4518801