• DocumentCode
    106564
  • Title

    Investigations on an EM-Style Optimization Algorithm for Discriminative Training of HMMs

  • Author

    Heigold, Georg ; Ney, Hermann ; Schluter, Ralf

  • Author_Institution
    Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany
  • Volume
    21
  • Issue
    12
  • fYear
    2013
  • fDate
    Dec. 2013
  • Firstpage
    2616
  • Lastpage
    2626
  • Abstract
    Today´s speech recognition systems are based on hidden Markov models (HMMs) with Gaussian mixture models whose parameters are estimated using a discriminative training criterion such as Maximum Mutual Information (MMI) or Minimum Phone Error (MPE). Currently, the optimization is almost always done with (empirical variants of) Extended Baum-Welch (EBW). This type of optimization requires sophisticated update schemes for the step sizes and a considerable amount of parameter tuning, and only little is known about its convergence behavior. In this paper, we derive an EM-style algorithm for discriminative training of HMMs. Like Expectation-Maximization (EM) for the generative training of HMMs, the proposed algorithm improves the training criterion on each iteration, converges to a local optimum, and is completely parameter-free. We investigate the feasibility of the proposed EM-style algorithm for discriminative training of two tasks, namely grapheme-to-phoneme conversion and spoken digit string recognition.
  • Keywords
    Gaussian processes; expectation-maximisation algorithm; hidden Markov models; optimisation; speech recognition; EBW; Gaussian mixture models; HMM; MMI; MPE; discriminative training criterion; expectation-maximization-style optimization algorithm; extended Baum-Welch; grapheme-to-phoneme conversion; hidden Markov models; maximum mutual information; minimum phone error; parameter tuning; speech recognition; spoken digit string recognition; Gaussian mixture model; Hidden Markov models; Optimization; Training; Expectation-maximization; discriminative training; generalized iterative scaling; hidden Markov model;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2013.2280234
  • Filename
    6588352