DocumentCode
394231
Title
Conversational telephone speech recognition
Author
Gauvain, J.L. ; Lamel, L. ; Schwenk, H. ; Adda, G. ; Chen, L. ; Lefèvre, F.
Author_Institution
Spoken Language Process. Group, LIMSI-CNRS, Orsay, France
Volume
1
fYear
2003
fDate
6-10 April 2003
Abstract
This paper describes the development of a speech recognition system for the processing of telephone conversations, starting with a state-of-the-art broadcast news transcription system. We identify major changes and improvements in acoustic and language modeling, as well as decoding, which are required to achieve state-of-the-art performance on conversational speech. Some major changes on the acoustic side include the use of speaker normalization (VTLN), the need to cope with channel variability, and the need for efficient speaker adaptation and better pronunciation modeling. On the linguistic side the primary challenge is to cope with the limited amount of language model training data. To address this issue we make use of a data selection technique, and a smoothing technique based on a neural network language model. At the decoding level lattice rescoring and minimum word error decoding are applied. On the development data, the improvements yield an overall word error rate of 24.9% whereas the original BN transcription system had a word error rate of about 50% on the same data.
Keywords
decoding; neural nets; normalising; speech recognition; BN transcription system; acoustic modeling; broadcast news transcription system; channel variability; data selection technique; decoding; language model training data; language modeling; lattice rescoring; minimum word error decoding; neural network language model; pronunciation modeling; smoothing technique; speaker adaptation; speaker normalization; speech recognition system; telephone conversations; word error rate; Broadcasting; Decoding; Error analysis; Loudspeakers; Natural languages; Neural networks; Smoothing methods; Speech recognition; Telephony; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-7663-3
Type
conf
DOI
10.1109/ICASSP.2003.1198755
Filename
1198755
Link To Document