Title :
Unsupervised language model adaptation
Author :
Bacchiani, Michiel ; Roark, Brian
Author_Institution :
AT&T Labs.-Res., USA
Abstract :
This paper investigates unsupervised language model adaptation, from ASR transcripts. N-gram counts from these transcripts can be used either to adapt an existing n-gram model or to build an n-gram model from scratch. Various experimental results are reported on a particular domain adaptation task, namely building a customer care application starting from a general voicemail transcription system. The experiments investigate the effectiveness of various adaptation strategies, including iterative adaptation and self-adaptation on the test data. They show an error rate reduction of 3.9% over the unadapted baseline performance, from 28% to 24.1%, using 17 hours of unsupervised adaptation material. This is 51% of the 7.7% adaptation gain obtained by supervised adaptation. Self-adaptation on the test data resulted in a 1.3% improvement over the baseline.
Keywords :
iterative methods; natural languages; speech recognition; voice mail; ASR transcripts; adaptation strategies; customer care application; domain adaptation task; error rate reduction; iterative adaptation; n-gram counts; n-gram model; self-adaptation; unadapted baseline performance; unsupervised adaptation material; unsupervised language model adaptation; voicemail transcription system; Acoustic testing; Adaptation model; Automatic speech recognition; Automatic testing; Error analysis; Speech recognition; System testing; Training data; Vocabulary; Voice mail;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
Print_ISBN :
0-7803-7663-3
DOI :
10.1109/ICASSP.2003.1198758