مرکز منطقه ای اطلاع رساني علوم و فناوري - BLSTM supported GEV beamformer front-end for the 3RD CHiME challenge

DocumentCode :

3744879

Title :

BLSTM supported GEV beamformer front-end for the 3RD CHiME challenge

Author :

Jahn Heymann;Lukas Drude;Aleksej Chinaev;Reinhold Haeb-Umbach

Author_Institution :

University of Paderborn, Department of Communications Engineering, Warburger Str. 100, Paderborn, Germany

fYear :

2015

Firstpage :

444

Lastpage :

451

Abstract :

We present a new beamformer front-end for Automatic Speech Recognition and apply it to the 3rd-CHiME Speech Separation and Recognition Challenge. Without any further modification of the back-end, we achieve a 53% relative reduction of the word error rate over the best baseline enhancement system for the relevant test data set. Our approach leverages the power of a bi-directional Long Short-Term Memory network to robustly estimate soft masks for a subsequent beamforming step. The utilized Generalized Eigenvalue beamforming operation with an optional Blind Analytic Normalization does not rely on a Direction-of-Arrival estimate and can cope with multi-path sound propagation, while at the same time only introducing very limited speech distortions. Our quite simple setup exploits the possibilities provided by simulated training data while still being able to generalize well to the fairly different real data. Finally, combining our front-end with data augmentation and another language model nearly yields a 64 % reduction of the word error rate on the real data test set.

Keywords :

"Speech","Training","Speech recognition","Array signal processing","Estimation","Artificial neural networks"

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on

Type :

conf

DOI :

10.1109/ASRU.2015.7404829

Filename :

7404829

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3744879