Title :
BLSTM supported GEV beamformer front-end for the 3RD CHiME challenge
Author :
Jahn Heymann;Lukas Drude;Aleksej Chinaev;Reinhold Haeb-Umbach
Author_Institution :
University of Paderborn, Department of Communications Engineering, Warburger Str. 100, Paderborn, Germany
Abstract :
We present a new beamformer front-end for Automatic Speech Recognition and apply it to the 3rd-CHiME Speech Separation and Recognition Challenge. Without any further modification of the back-end, we achieve a 53% relative reduction of the word error rate over the best baseline enhancement system for the relevant test data set. Our approach leverages the power of a bi-directional Long Short-Term Memory network to robustly estimate soft masks for a subsequent beamforming step. The utilized Generalized Eigenvalue beamforming operation with an optional Blind Analytic Normalization does not rely on a Direction-of-Arrival estimate and can cope with multi-path sound propagation, while at the same time only introducing very limited speech distortions. Our quite simple setup exploits the possibilities provided by simulated training data while still being able to generalize well to the fairly different real data. Finally, combining our front-end with data augmentation and another language model nearly yields a 64 % reduction of the word error rate on the real data test set.
Keywords :
"Speech","Training","Speech recognition","Array signal processing","Estimation","Artificial neural networks"
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
DOI :
10.1109/ASRU.2015.7404829