• DocumentCode
    3744875
  • Title

    Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection

  • Author

    Yusuke Fujita;Ryoichi Takashima;Takeshi Homma;Rintaro Ikeshita;Yohei Kawaguchi;Takashi Sumiyoshi;Takashi Endo;Masahito Togami

  • Author_Institution
    Hitachi, Ltd. Research and Development Group
  • fYear
    2015
  • Firstpage
    416
  • Lastpage
    422
  • Abstract
    In this paper, we propose a unified system that incorporates speech source separation and automatic speech recognition for various noise environments. There are three features in the proposed system. The first feature of the proposed method is the LGM (local Gaussian modeling) based source separation with the efficient permutation alignment method that integrates a power spectrum correlation based method and a direction-of-arrival (DOA) based method. Evaluation results show that using the separated speech with the baseline acoustic modeling method reduces the word error rate (WER) significantly. The second feature of the proposed method is multi-condition training with per-utterance normalized features and noise-aware features in the acoustic modeling step. In this paper, we show that the proposed training method is effective even when an input signal has been distorted through the source separation step. The third feature is the word hypothesis selection method for integrating multiple recognition results. The proposed selection method estimates correct words based on a recognizer´s confidence and co-occurrence characteristics. The evaluation results show that the proposed selection method outperforms the conventional recognizer output voting error reduction (ROVER) method. The proposed system is evaluated using the third CHiME challenge dataset. Evaluation results show that the proposed system resulted in an improvement of 66.1% over the baseline system.
  • Keywords
    "Acoustics","Training","Speech","Hidden Markov models","Speech enhancement","Speech recognition","Source separation"
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
  • Type

    conf

  • DOI
    10.1109/ASRU.2015.7404825
  • Filename
    7404825