A motion feature approach for audio-visual recognition

Author

Pao, Tsang-Long ; Liao, Wen-Yuan

Author_Institution

Dept. of Comput. Sci. & Eng., Tatung Univ., Taipei, Taiwan

fYear

2005

fDate

7-10 Aug. 2005

Firstpage

421

Abstract

Automatic speech recognition (ASR) by machine has been a goal and an attractive research area for past several decades. In recent years, there have been many automatic speech-reading systems proposed, that combine audio and visual speech features. For all such systems, the objective of these audio-visual speech recognizers is to improve recognition accuracy, particularly in the difficult condition. In this paper, we focus on the visual feature extraction for the audio-visual recognition. The audio-visual recognition consists of two main steps: feature extraction and recognition. In the proposed approach, we extract the visual motion feature of the lip for the front end processing. In the post-processing, the Gaussian mixture model (GMM) is used for the audio-visual speech recognition. We study and use this method in the proposed system, with some preliminary experiments. Conclusions are also discussed.

Keywords

Gaussian processes; audio-visual systems; feature extraction; speech recognition; Gaussian mixture model; audio-visual recognition; audio-visual speech features; automatic speech recognition; feature recognition; motion feature; recognition accuracy; visual feature extraction; Auditory system; Automatic speech recognition; Computer science; Data acquisition; Feature extraction; Humans; Pattern recognition; Speech analysis; Speech processing; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Circuits and Systems, 2005. 48th Midwest Symposium on

Print_ISBN

0-7803-9197-7

Type

conf

DOI

10.1109/MWSCAS.2005.1594127

Filename

1594127