DocumentCode :
3744879
Title :
BLSTM supported GEV beamformer front-end for the 3RD CHiME challenge
Author :
Jahn Heymann;Lukas Drude;Aleksej Chinaev;Reinhold Haeb-Umbach
Author_Institution :
University of Paderborn, Department of Communications Engineering, Warburger Str. 100, Paderborn, Germany
fYear :
2015
Firstpage :
444
Lastpage :
451
Abstract :
We present a new beamformer front-end for Automatic Speech Recognition and apply it to the 3rd-CHiME Speech Separation and Recognition Challenge. Without any further modification of the back-end, we achieve a 53% relative reduction of the word error rate over the best baseline enhancement system for the relevant test data set. Our approach leverages the power of a bi-directional Long Short-Term Memory network to robustly estimate soft masks for a subsequent beamforming step. The utilized Generalized Eigenvalue beamforming operation with an optional Blind Analytic Normalization does not rely on a Direction-of-Arrival estimate and can cope with multi-path sound propagation, while at the same time only introducing very limited speech distortions. Our quite simple setup exploits the possibilities provided by simulated training data while still being able to generalize well to the fairly different real data. Finally, combining our front-end with data augmentation and another language model nearly yields a 64 % reduction of the word error rate on the real data test set.
Keywords :
"Speech","Training","Speech recognition","Array signal processing","Estimation","Artificial neural networks"
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
Type :
conf
DOI :
10.1109/ASRU.2015.7404829
Filename :
7404829
Link To Document :
بازگشت