• DocumentCode
    2932879
  • Title

    Minimum phone error based stream weight training for mandarin audio-visual Speech recognition

  • Author

    Wu, Guanyong ; Zhu, Jie ; Xu, Haihua

  • Author_Institution
    Dept. of Electron. Eng., Shanghai Jiao Tong Univ., Shanghai, China
  • fYear
    2009
  • fDate
    June 28 2009-July 3 2009
  • Firstpage
    902
  • Lastpage
    905
  • Abstract
    Stream weight training is one of the key issues in the bimodal integration for the audio-visual speech recognition. In this paper, the audio- and video-only HMM classifiers are combined to recognize audio-visual speech recognition. More specifically, a discriminative training method is provided, in which the state-dependent stream weights are trained based on lattice rescoring by the minimum phone error using the extended Baum Welch algorithm. The proposed method is evaluated on our Mandarin large vocabulary audio-visual database. Experimental results show the proposed method has achieved significant error reduction than traditional global stream weight based approach and outperforms the minimum classification error based discriminative stream weight training method.
  • Keywords
    audio-visual systems; hidden Markov models; speech recognition; HMM classifier; Mandarin large vocabulary audio-visual database; bimodal integration; discriminative training method; extended Baum Welch algorithm; hidden Markov model; mandarin audio-visual speech recognition; minimum classification error; minimum phone error; state-dependent stream weight training method; stream weight training method; Audio databases; Automatic speech recognition; Hidden Markov models; Lattices; Lips; Maximum likelihood estimation; Speech recognition; Streaming media; Visual databases; Vocabulary; Audio-visual speech recognition (AVSR); Discriminative training; Minimum phone error (MPE);
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on
  • Conference_Location
    New York, NY
  • ISSN
    1945-7871
  • Print_ISBN
    978-1-4244-4290-4
  • Electronic_ISBN
    1945-7871
  • Type

    conf

  • DOI
    10.1109/ICME.2009.5202641
  • Filename
    5202641