• DocumentCode
    357107
  • Title

    A new approach to integrate audio and visual features of speech

  • Author

    Pan, Hao ; Liang, Zhi-Pei ; Huang, Thomas S.

  • Author_Institution
    Beckman Inst. for Adv. Sci. & Technol., Illinois Univ., Urbana, IL, USA
  • Volume
    2
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    1093
  • Abstract
    This paper presents a novel fused-hidden Markov model (fused-HMM) to integrate the audio and visual features of speech. In this model, audio and visual HMMs built individually are fused together using a general probabilistic fusion method, which is optimal in the maximum entropy sense. Specifically, the fusion method uses the dependencies between the audio hidden states and the visual observations to infer the dependencies between audio and video. The learning and inference algorithms described in this paper can handle audio and video features with different data rates and duration. In speaker verification experiments, the results show that the proposed method significantly reduces the recognition error rate as compared to unimodal HMMs and other simpler fusion methods
  • Keywords
    audio signal processing; hidden Markov models; inference mechanisms; learning (artificial intelligence); maximum entropy methods; speaker recognition; video signal processing; audio hidden states; audio/visual feature integration; fused-hidden Markov model; inference algorithms; learning algorithms; maximum entropy; probabilistic fusion method; recognition error rate; speaker verification experiments; speech; visual observations; Entropy; Error analysis; Hidden Markov models; Human computer interaction; Inference algorithms; Sampling methods; Speech recognition; TV;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on
  • Conference_Location
    New York, NY
  • Print_ISBN
    0-7803-6536-4
  • Type

    conf

  • DOI
    10.1109/ICME.2000.871551
  • Filename
    871551