Title :
Speaking rate adaptation using continuous frame rate normalization
Author :
Chu, Stephen M. ; Povey, Daniel
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
This paper describes a speaking rate adaptation technique for automatic speech recognition. The technique aims to reduce speaking rate variations by applying temporal warping in front-end processing so that the average phone duration in terms of feature frames remains constant. Speaking rate estimates are given by timing information from unadapted decoding outputs. We implement the proposed continuous frame rate normalization (CFRN) technique on a state-of-the-art speech recognition architecture, and evaluate it on the most recent GALE broadcast transcription tasks. Results show that CFRN gives consistent improvement on all four separate systems and two different languages. In fact, the reported numbers represent the best decoding error rates of the corresponding test sets. It is further shown that the technique is effective without retraining, and adds little overhead to the multi-pass recognition pipeline found in state-of-the-art transcription systems.
Keywords :
adaptive signal processing; decoding; speech recognition; GALE broadcast transcription task; automatic speech recognition; continuous frame rate normalization; front-end processing; multipass recognition pipeline; speaking rate adaptation; unadapted decoding; Automatic speech recognition; Broadcasting; Error analysis; Hidden Markov models; Maximum likelihood decoding; Maximum likelihood linear regression; Pipelines; Speech recognition; Testing; Timing; CFRN; frame rate normalization; speaking rate adaptation; speech recognition;
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2010.5495656