• DocumentCode
    3164956
  • Title

    Improvements to VTS feature enhancement

  • Author

    Li, Jinyu ; Seltzer, Michael L. ; Gong, Yifan

  • Author_Institution
    Microsoft Corp., Redmond, WA, USA
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    4677
  • Lastpage
    4680
  • Abstract
    By explicitly modelling the distortion of speech signals, model adaptation based on vector Taylor series (VTS) approaches have been shown to significantly improve the robustness of speech recognizers to environmental noise. However, the computational cost of VTS model adaptation (MVTS) methods hinders them from being widely used because they need to adapt all the HMM parameters for every utterance at runtime. In contrast, VTS feature enhancement (FVTS) methods have more computation advantages because they do not need multiple decoding passes and do not adapt all the HMM model parameters. In this paper, we propose two improvements to VTS feature enhancement: updating all of the environment distortion parameters and noise adaptive training of the front-end GMM. In addition, we investigate some other performance-related issues such as the selection of FVTS algorithms and the spectrum domain that MFCC is extracted from. As an important result of our investigation, we established the FVTS method can achieve comparable accuracy as the MVTS method with a smaller runtime cost. This makes FVTS method an ideal candidate for real world tasks.
  • Keywords
    decoding; distortion; feature extraction; hidden Markov models; speech enhancement; speech recognition; FVTS methods; HMM model parameters; MVTS methods; VTS feature enhancement methods; VTS model adaptation method; computational cost; environment distortion parameters; environmental noise; front-end GMM; multiple decoding passes; noise adaptive training; real world tasks; speech recognizers; speech signals distortion; vector Taylor series-based model adaptation; Accuracy; Adaptation models; Hidden Markov models; Noise; Noise measurement; Nonlinear distortion; Speech; VTS; feature enhancement; model adaptation; robust ASR;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6288962
  • Filename
    6288962