• DocumentCode
    1858210
  • Title

    Discriminative training of HMM stream exponents for audio-visual speech recognition

  • Author

    Potamianos, Gerasimos ; Graf, Hans Peter

  • Author_Institution
    AT&T Labs., Florham Park, NJ, USA
  • Volume
    6
  • fYear
    1998
  • fDate
    12-15 May 1998
  • Firstpage
    3733
  • Abstract
    We propose the use of discriminative training by means of the generalized probabilistic descent (GPB) algorithm to estimate hidden Markov model (HMM) stream exponents for audio-visual speech recognition. Synchronized audio and visual features are used to respectively train audio-only and visual-only single-stream HMMs of identical topology by maximum likelihood. A two-stream HMM is then obtained by combining the two single-stream HMMs and introducing exponents that weigh the log-likelihood of each stream. We present the GPD algorithm for stream exponent estimation, consider a possible initialization, and apply it to the single speaker connected letters task of the AT&T bimodal database. We demonstrate the superior performance of the resulting multi-stream HMM to the audio-only, visual-only, and audio-visual single-stream HMMs
  • Keywords
    audio-visual systems; feature extraction; hidden Markov models; maximum likelihood estimation; probability; speech recognition; synchronisation; AT&T bimodal database; HMM stream exponents; audio features; audio-only stream; audio-visual speech recognition; discriminative training; generalized probabilistic descent algorithm; hidden Markov model; initialization; log-likelihood; maximum likelihood; single speaker connected letters task; stream exponent estimation; synchronized features; two-stream HMM; visual features; visual-only stream; Automatic speech recognition; Hidden Markov models; Lips; Mutual information; Speech recognition; Streaming media; Testing; Topology; Visual databases; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
  • Conference_Location
    Seattle, WA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-4428-6
  • Type

    conf

  • DOI
    10.1109/ICASSP.1998.679695
  • Filename
    679695