• DocumentCode
    3412550
  • Title

    Automatic speech recognition system using acoustic and visual signals

  • Author

    Hennecke, Marcus E. ; Prasad, E. Venaktesh ; Stork, David G.

  • Author_Institution
    Dept. of Electr. Eng., Stanford Univ., CA, USA
  • Volume
    2
  • fYear
    1995
  • fDate
    Oct. 30 1995-Nov. 1 1995
  • Firstpage
    1214
  • Abstract
    Automatic speech-reading systems use both acoustic and visual signals to perform speech recognition. In previous work, we have shown how visual speech can improve recognition accuracy of automatic speech recognition and have described an algorithm based on deformable templates that accurately infers lip dynamics. In this paper we present a complete speech-reading system, which is able to record an utterance using a standard color video camera, preprocess both the audio and video signal, and perform speech recognition. This system is based on new algorithms for finding the talker´s face and mouth and an improved template algorithm for tracking the lips. We will also compare the results from our new system with our previous work and discuss various strategies for integration of the two modalities.
  • Keywords
    speech recognition; acoustic signals; automatic speech recognition system; color video camera; deformable templates; image sequence; lip dynamics; speech reading system; template algorithm; visual signals; Automatic speech recognition; Cameras; Data mining; Feature extraction; Hidden Markov models; Lips; Mouth; Neural networks; Skin; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signals, Systems and Computers, 1995. 1995 Conference Record of the Twenty-Ninth Asilomar Conference on
  • Conference_Location
    Pacific Grove, CA, USA
  • ISSN
    1058-6393
  • Print_ISBN
    0-8186-7370-2
  • Type

    conf

  • DOI
    10.1109/ACSSC.1995.540892
  • Filename
    540892