DocumentCode :
3427466
Title :
An unsupervised web-based topic language model adaptation method
Author :
Lecorvé, Gwénolé ; Gravier, Guillaume ; Sébillot, Pascale
Author_Institution :
IRISA, Rennes
fYear :
2008
fDate :
March 31 2008-April 4 2008
Firstpage :
5081
Lastpage :
5084
Abstract :
This paper focuses on a solution to better adapt ASR systems, whose language models (LM) are usually trained on topic-independent corpora, to new topics, in particular in the case of broadcast news. We propose a new complete and fully unsupervised technique that selects keywords from each segment using information retrieval methods, to build a thematically coherent adaptation corpus from the Internet. The LM used for the initial transcription is then adapted before rescoring word lattices. Experimental results demonstrate the validity of the proposed adaptation technique with a significant reduction of the perplexity after LM adaptation. Word error rates are also improved in some cases though to a lesser extent.
Keywords :
Internet; natural language processing; query processing; speech recognition; information retrieval; natural languages; rescoring word lattices; speech recognition; topic-independent corpora; unsupervised Web-based topic language model; word error rates; Adaptation model; Automatic speech recognition; Broadcasting; Error analysis; Information retrieval; Internet; Lattices; Natural languages; Streaming media; Vocabulary; Internet; Speech recognition; natural languages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
ISSN :
1520-6149
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2008.4518801
Filename :
4518801
Link To Document :
بازگشت