Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection

Author

Yusuke Fujita;Ryoichi Takashima;Takeshi Homma;Rintaro Ikeshita;Yohei Kawaguchi;Takashi Sumiyoshi;Takashi Endo;Masahito Togami

Author_Institution

Hitachi, Ltd. Research and Development Group

fYear

2015

Firstpage

416

Lastpage

422

Abstract

In this paper, we propose a unified system that incorporates speech source separation and automatic speech recognition for various noise environments. There are three features in the proposed system. The first feature of the proposed method is the LGM (local Gaussian modeling) based source separation with the efficient permutation alignment method that integrates a power spectrum correlation based method and a direction-of-arrival (DOA) based method. Evaluation results show that using the separated speech with the baseline acoustic modeling method reduces the word error rate (WER) significantly. The second feature of the proposed method is multi-condition training with per-utterance normalized features and noise-aware features in the acoustic modeling step. In this paper, we show that the proposed training method is effective even when an input signal has been distorted through the source separation step. The third feature is the word hypothesis selection method for integrating multiple recognition results. The proposed selection method estimates correct words based on a recognizer´s confidence and co-occurrence characteristics. The evaluation results show that the proposed selection method outperforms the conventional recognizer output voting error reduction (ROVER) method. The proposed system is evaluated using the third CHiME challenge dataset. Evaluation results show that the proposed system resulted in an improvement of 66.1% over the baseline system.

Keywords

"Acoustics","Training","Speech","Hidden Markov models","Speech enhancement","Speech recognition","Source separation"

Publisher

ieee

Conference_Titel

Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on

Type

conf

DOI

10.1109/ASRU.2015.7404825

Filename

7404825