مرکز منطقه ای اطلاع رساني علوم و فناوري - I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription

DocumentCode :

179887

Title :

I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription

Author :

Gupta, V. ; Kenny, P. ; Ouellet, Pierre ; Stafylakis, Themos

Author_Institution :

Centre de Rech. Inf. de Montreal, Montréal, QC, Canada

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

6334

Lastpage :

6338

Abstract :

State of the art speaker recognition systems are based on the i-vector representation of speech segments. In this paper we show how this representation can be used to perform blind speaker adaptation of hybrid DNN-HMM speech recognition system and we report excellent results on a French language audio transcription task. The implemenation is very simple. An audio file is first diarized and each speaker cluster is represented by an i-vector. Acoustic feature vectors are augmented by the corresponding i-vectors before being presented to the DNN. (The same i-vector is used for all acoustic feature vectors aligned with a given speaker.) This supplementary information improves the DNN´s ability to discriminate between phonetic events in a speaker independent way without having to make any modification to the DNN training algorithms. We report results on the ETAPE 2011 transcription task, and show that i-vector based speaker adaptation is effective irrespective of whether cross-entropy or sequence training is used. For cross-entropy training, we obtained a word error rate (WER) reduction from 22.16% to 20.67% whereas for sequence training the WER reduces from 19.93% to 18.40%.

Keywords :

audio signal processing; hidden Markov models; learning (artificial intelligence); neural nets; signal representation; speaker recognition; DNN training algorithms; ETAPE 2011 transcription; French broadcast audio transcription; French language audio transcription; WER; acoustic feature vectors; audio file diarization; blind speaker adaptation; cross-entropy training; deep neural networks; hidden Markov models; hybrid DNN-HMM speech recognition system; i-vector representation; i-vector-based speaker adaptation; sequence training; speaker recognition systems; speech segments; word error rate; Acoustics; Hidden Markov models; Speech; Speech recognition; Training; Transforms; Vectors; Deep Neural Networks; HMM; i-vectors; speaker adaptation; speech recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6854823

Filename :

6854823

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=179887