DocumentCode
3744875
Title
Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection
Author
Yusuke Fujita;Ryoichi Takashima;Takeshi Homma;Rintaro Ikeshita;Yohei Kawaguchi;Takashi Sumiyoshi;Takashi Endo;Masahito Togami
Author_Institution
Hitachi, Ltd. Research and Development Group
fYear
2015
Firstpage
416
Lastpage
422
Abstract
In this paper, we propose a unified system that incorporates speech source separation and automatic speech recognition for various noise environments. There are three features in the proposed system. The first feature of the proposed method is the LGM (local Gaussian modeling) based source separation with the efficient permutation alignment method that integrates a power spectrum correlation based method and a direction-of-arrival (DOA) based method. Evaluation results show that using the separated speech with the baseline acoustic modeling method reduces the word error rate (WER) significantly. The second feature of the proposed method is multi-condition training with per-utterance normalized features and noise-aware features in the acoustic modeling step. In this paper, we show that the proposed training method is effective even when an input signal has been distorted through the source separation step. The third feature is the word hypothesis selection method for integrating multiple recognition results. The proposed selection method estimates correct words based on a recognizer´s confidence and co-occurrence characteristics. The evaluation results show that the proposed selection method outperforms the conventional recognizer output voting error reduction (ROVER) method. The proposed system is evaluated using the third CHiME challenge dataset. Evaluation results show that the proposed system resulted in an improvement of 66.1% over the baseline system.
Keywords
"Acoustics","Training","Speech","Hidden Markov models","Speech enhancement","Speech recognition","Source separation"
Publisher
ieee
Conference_Titel
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
Type
conf
DOI
10.1109/ASRU.2015.7404825
Filename
7404825
Link To Document