Title :
I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription
Author :
Gupta, V. ; Kenny, P. ; Ouellet, Pierre ; Stafylakis, Themos
Author_Institution :
Centre de Rech. Inf. de Montreal, Montréal, QC, Canada
Abstract :
State of the art speaker recognition systems are based on the i-vector representation of speech segments. In this paper we show how this representation can be used to perform blind speaker adaptation of hybrid DNN-HMM speech recognition system and we report excellent results on a French language audio transcription task. The implemenation is very simple. An audio file is first diarized and each speaker cluster is represented by an i-vector. Acoustic feature vectors are augmented by the corresponding i-vectors before being presented to the DNN. (The same i-vector is used for all acoustic feature vectors aligned with a given speaker.) This supplementary information improves the DNN´s ability to discriminate between phonetic events in a speaker independent way without having to make any modification to the DNN training algorithms. We report results on the ETAPE 2011 transcription task, and show that i-vector based speaker adaptation is effective irrespective of whether cross-entropy or sequence training is used. For cross-entropy training, we obtained a word error rate (WER) reduction from 22.16% to 20.67% whereas for sequence training the WER reduces from 19.93% to 18.40%.
Keywords :
audio signal processing; hidden Markov models; learning (artificial intelligence); neural nets; signal representation; speaker recognition; DNN training algorithms; ETAPE 2011 transcription; French broadcast audio transcription; French language audio transcription; WER; acoustic feature vectors; audio file diarization; blind speaker adaptation; cross-entropy training; deep neural networks; hidden Markov models; hybrid DNN-HMM speech recognition system; i-vector representation; i-vector-based speaker adaptation; sequence training; speaker recognition systems; speech segments; word error rate; Acoustics; Hidden Markov models; Speech; Speech recognition; Training; Transforms; Vectors; Deep Neural Networks; HMM; i-vectors; speaker adaptation; speech recognition;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854823