Improving DNN speaker independence with I-vector inputs

Author

Senior, Alan ; Lopez-Moreno, Ignacio

Author_Institution

Google Inc., New York, NY, USA

fYear

2014

fDate

4-9 May 2014

Firstpage

225

Lastpage

229

Abstract

We propose providing additional utterance-level features as inputs to a deep neural network (DNN) to facilitate speaker, channel and background normalization. Modifications of the basic algorithm are developed which result in significant reductions in word error rates (WERs). The algorithms are shown to combine well with speaker adaptation by backpropagation, resulting in a 9% relative WER reduction. We address implementation of the algorithm for a streaming task.

Keywords

backpropagation; feature extraction; neural nets; speech processing; vectors; DNN speaker independence; I-vector inputs; WER; background normalization; backpropagation; channel normalization; deep neural network; speaker normalization; streaming task; utterance-level features; word error rates; Adaptation models; Computational modeling; Data models; Hidden Markov models; Neural networks; Speech; Training; Deep neural networks; Voice Search; i-vectors; large vocabulary speech recognition; speaker adaptation;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6853591

Filename

6853591

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=177474