DocumentCode :
3744878
Title :
The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices
Author :
Takuya Yoshioka;Nobutaka Ito;Marc Delcroix;Atsunori Ogawa;Keisuke Kinoshita;Masakiyo Fujimoto;Chengzhu Yu;Wojciech J. Fabian;Miquel Espi;Takuya Higuchi;Shoko Araki;Tomohiro Nakatani
Author_Institution :
NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan
fYear :
2015
Firstpage :
436
Lastpage :
443
Abstract :
CHiME-3 is a research community challenge organised in 2015 to evaluate speech recognition systems for mobile multi-microphone devices used in noisy daily environments. This paper describes NTT´s CHiME-3 system, which integrates advanced speech enhancement and recognition techniques. Newly developed techniques include the use of spectral masks for acoustic beam-steering vector estimation and acoustic modelling with deep convolutional neural networks based on the "network in network" concept. In addition to these improvements, our system has several key differences from the official baseline system. The differences include multi-microphone training, dereverberation, and cross adaptation of neural networks with different architectures. The impacts that these techniques have on recognition performance are investigated. By combining these advanced techniques, our system achieves a 3.45% development error rate and a 5.83% evaluation error rate. Three simpler systems are also developed to perform evaluations with constrained set-ups.
Keywords :
"Acoustics","Training","Hidden Markov models","Speech recognition","Decoding","Speech enhancement","Speech"
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
Type :
conf
DOI :
10.1109/ASRU.2015.7404828
Filename :
7404828
Link To Document :
بازگشت