DocumentCode :
2245781
Title :
Audiovisual speech recognition using multiscale nonlinear image decomposition
Author :
Matthews, Iain ; Bangham, J. Andrew ; Cox, Stephen
Author_Institution :
Sch. of Inf. Syst., East Anglia Univ., Norwich, UK
Volume :
1
fYear :
1996
fDate :
3-6 Oct 1996
Firstpage :
38
Abstract :
There has recently been increasing interest in the idea of enhancing speech recognition by the use of visual information derived from the face of the talker. This paper demonstrates the use of nonlinear image decomposition, in the form of a “sieve”, applied to the task of visual speech recognition. Information derived from the mouth region is used in visual and audio-visual speech recognition of a database of the letters A-Z for four talkers. A scale histogram is generated directly from the gray-scale pixels of a window containing the talker´s mouth on a per-frame basis. Results are presented for visual-only, audio-only and a simple audio-visual case
Keywords :
audio-visual systems; image recognition; image segmentation; speech recognition; visual databases; audiovisual speech recognition; character database; gray-scale pixels; multiscale nonlinear image decomposition; scale histogram; sieve; talker´s face; talker´s mouth; visual information; window; Audio databases; Data mining; Feature extraction; Gray-scale; Histograms; Image databases; Image decomposition; Mouth; Speech recognition; Visual databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
Type :
conf
DOI :
10.1109/ICSLP.1996.607019
Filename :
607019
Link To Document :
بازگشت