• DocumentCode
    3251203
  • Title

    A motion feature approach for audio-visual recognition

  • Author

    Pao, Tsang-Long ; Liao, Wen-Yuan

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Tatung Univ., Taipei, Taiwan
  • fYear
    2005
  • fDate
    7-10 Aug. 2005
  • Firstpage
    421
  • Abstract
    Automatic speech recognition (ASR) by machine has been a goal and an attractive research area for past several decades. In recent years, there have been many automatic speech-reading systems proposed, that combine audio and visual speech features. For all such systems, the objective of these audio-visual speech recognizers is to improve recognition accuracy, particularly in the difficult condition. In this paper, we focus on the visual feature extraction for the audio-visual recognition. The audio-visual recognition consists of two main steps: feature extraction and recognition. In the proposed approach, we extract the visual motion feature of the lip for the front end processing. In the post-processing, the Gaussian mixture model (GMM) is used for the audio-visual speech recognition. We study and use this method in the proposed system, with some preliminary experiments. Conclusions are also discussed.
  • Keywords
    Gaussian processes; audio-visual systems; feature extraction; speech recognition; Gaussian mixture model; audio-visual recognition; audio-visual speech features; automatic speech recognition; feature recognition; motion feature; recognition accuracy; visual feature extraction; Auditory system; Automatic speech recognition; Computer science; Data acquisition; Feature extraction; Humans; Pattern recognition; Speech analysis; Speech processing; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Circuits and Systems, 2005. 48th Midwest Symposium on
  • Print_ISBN
    0-7803-9197-7
  • Type

    conf

  • DOI
    10.1109/MWSCAS.2005.1594127
  • Filename
    1594127