DocumentCode
3251203
Title
A motion feature approach for audio-visual recognition
Author
Pao, Tsang-Long ; Liao, Wen-Yuan
Author_Institution
Dept. of Comput. Sci. & Eng., Tatung Univ., Taipei, Taiwan
fYear
2005
fDate
7-10 Aug. 2005
Firstpage
421
Abstract
Automatic speech recognition (ASR) by machine has been a goal and an attractive research area for past several decades. In recent years, there have been many automatic speech-reading systems proposed, that combine audio and visual speech features. For all such systems, the objective of these audio-visual speech recognizers is to improve recognition accuracy, particularly in the difficult condition. In this paper, we focus on the visual feature extraction for the audio-visual recognition. The audio-visual recognition consists of two main steps: feature extraction and recognition. In the proposed approach, we extract the visual motion feature of the lip for the front end processing. In the post-processing, the Gaussian mixture model (GMM) is used for the audio-visual speech recognition. We study and use this method in the proposed system, with some preliminary experiments. Conclusions are also discussed.
Keywords
Gaussian processes; audio-visual systems; feature extraction; speech recognition; Gaussian mixture model; audio-visual recognition; audio-visual speech features; automatic speech recognition; feature recognition; motion feature; recognition accuracy; visual feature extraction; Auditory system; Automatic speech recognition; Computer science; Data acquisition; Feature extraction; Humans; Pattern recognition; Speech analysis; Speech processing; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Circuits and Systems, 2005. 48th Midwest Symposium on
Print_ISBN
0-7803-9197-7
Type
conf
DOI
10.1109/MWSCAS.2005.1594127
Filename
1594127
Link To Document