Title :
A motion feature approach for audio-visual recognition
Author :
Pao, Tsang-Long ; Liao, Wen-Yuan
Author_Institution :
Dept. of Comput. Sci. & Eng., Tatung Univ., Taipei, Taiwan
Abstract :
Automatic speech recognition (ASR) by machine has been a goal and an attractive research area for past several decades. In recent years, there have been many automatic speech-reading systems proposed, that combine audio and visual speech features. For all such systems, the objective of these audio-visual speech recognizers is to improve recognition accuracy, particularly in the difficult condition. In this paper, we focus on the visual feature extraction for the audio-visual recognition. The audio-visual recognition consists of two main steps: feature extraction and recognition. In the proposed approach, we extract the visual motion feature of the lip for the front end processing. In the post-processing, the Gaussian mixture model (GMM) is used for the audio-visual speech recognition. We study and use this method in the proposed system, with some preliminary experiments. Conclusions are also discussed.
Keywords :
Gaussian processes; audio-visual systems; feature extraction; speech recognition; Gaussian mixture model; audio-visual recognition; audio-visual speech features; automatic speech recognition; feature recognition; motion feature; recognition accuracy; visual feature extraction; Auditory system; Automatic speech recognition; Computer science; Data acquisition; Feature extraction; Humans; Pattern recognition; Speech analysis; Speech processing; Speech recognition;
Conference_Titel :
Circuits and Systems, 2005. 48th Midwest Symposium on
Print_ISBN :
0-7803-9197-7
DOI :
10.1109/MWSCAS.2005.1594127