• DocumentCode
    417673
  • Title

    Audio visual word spotting

  • Author

    Liu, Ming ; Xiong, Ziyou ; Chu, Stephen M. ; Zhang, Zhenqiu ; Huang, Thomas S.

  • Author_Institution
    Beckman Inst. for Adv. Sci. & Technol., Illinois Univ., Urbana, IL, USA
  • Volume
    3
  • fYear
    2004
  • fDate
    17-21 May 2004
  • Abstract
    The task of word spotting is to detect and verify some specific words embedded in unconstrained speech. Most word spotters based on hidden Markov models (HMMs) have the same noise robustness problem as a speech recognizer. The performance of a word spotter drops significantly under a noisy environment. Visual speech information has been shown to improve noise robustness of speech recognizers (Neti, C. et al., 2000; Nefian, A.V. et al., 2002; Potamianos, G. et al., 2003). We add visual speech information to improve the noise robustness of the word spotter. In visual frontend processing, the information-based maximum discrimination (IBMD) algorithm (Colmenarez, A. and Huang, T.S., 1997) is used to detect the face/mouth corners. In audio-visual fusion, feature-level fusion is adopted. We compare the audio-visual word-spotter with the audio-only spotter and show the advantage of the former approach over the latter.
  • Keywords
    acoustic noise; audio-visual systems; face recognition; feature extraction; hidden Markov models; object detection; random noise; sensor fusion; speech recognition; HMM; audio visual word spotting; audio-visual fusion; face detection; feature extraction; feature-level fusion; hidden Markov models; information-based maximum discrimination algorithm; mouth corner detection; noise robustness; speech recognizer; unconstrained speech; visual frontend processing; visual speech information; Face detection; Feature extraction; Hidden Markov models; Humans; Mouth; Noise robustness; Speech enhancement; Speech recognition; Vocabulary; Working environment noise;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-8484-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2004.1326662
  • Filename
    1326662