Title :
Audiovisual speech recognition using multiscale nonlinear image decomposition
Author :
Matthews, Iain ; Bangham, J. Andrew ; Cox, Stephen
Author_Institution :
Sch. of Inf. Syst., East Anglia Univ., Norwich, UK
Abstract :
There has recently been increasing interest in the idea of enhancing speech recognition by the use of visual information derived from the face of the talker. This paper demonstrates the use of nonlinear image decomposition, in the form of a “sieve”, applied to the task of visual speech recognition. Information derived from the mouth region is used in visual and audio-visual speech recognition of a database of the letters A-Z for four talkers. A scale histogram is generated directly from the gray-scale pixels of a window containing the talker´s mouth on a per-frame basis. Results are presented for visual-only, audio-only and a simple audio-visual case
Keywords :
audio-visual systems; image recognition; image segmentation; speech recognition; visual databases; audiovisual speech recognition; character database; gray-scale pixels; multiscale nonlinear image decomposition; scale histogram; sieve; talker´s face; talker´s mouth; visual information; window; Audio databases; Data mining; Feature extraction; Gray-scale; Histograms; Image databases; Image decomposition; Mouth; Speech recognition; Visual databases;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607019