Title :
Speech selection and environmental adaptation for asynchronous speech recognition
Author :
Bo Ren;Longbiao Wang;Atsuhiko Kai;Zhaofeng Zhang
Author_Institution :
Nagaoka University of Technology, Nagaoka 940-2188, Japan
Abstract :
In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. This is implemented by combining automatic asynchronous speech (microphone or mobile terminal) selection and environmental adaptation with deep neural network based framework. Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous mobile terminals. For the system proposed in this paper, by using bottleneck Features (BFs) from a Deep Neural Network (DNN) rather than the conventional Mel-Frequency Cesptral Coefficients (MFCCs), we adopted the state-of-the-art deep neural network acoustic model, environmental adaptation and automatic asynchronous speech selection. The proposed method was evaluated by using a reverberant WSJCAM0 corpus, which was emitted by a loudspeaker and recorded in a meeting room with multiple speakers by far-field multiple mobile terminals. By using the bottleneck features based DNN acoustic model with automatic asynchronous speech selection and environmental adaptation, the average Word Error Rate (WER) was reduced from 55.32% of the baseline system to 19.38%, i.e. the relative error reduction rate was 64.97%.
Keywords :
"Speech","Speech recognition","Mobile communication","Hidden Markov models","Reverberation","Robustness","Training"
Conference_Titel :
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
DOI :
10.1109/APSIPA.2015.7415485