مرکز منطقه ای اطلاع رساني علوم و فناوري - Audiovisual speech recognition using multiscale nonlinear image decomposition

DocumentCode :

2245781

Title :

Audiovisual speech recognition using multiscale nonlinear image decomposition

Author :

Matthews, Iain ; Bangham, J. Andrew ; Cox, Stephen

Author_Institution :

Sch. of Inf. Syst., East Anglia Univ., Norwich, UK

Volume :

fYear :

1996

fDate :

3-6 Oct 1996

Firstpage :

Abstract :

There has recently been increasing interest in the idea of enhancing speech recognition by the use of visual information derived from the face of the talker. This paper demonstrates the use of nonlinear image decomposition, in the form of a “sieve”, applied to the task of visual speech recognition. Information derived from the mouth region is used in visual and audio-visual speech recognition of a database of the letters A-Z for four talkers. A scale histogram is generated directly from the gray-scale pixels of a window containing the talker´s mouth on a per-frame basis. Results are presented for visual-only, audio-only and a simple audio-visual case

Keywords :

audio-visual systems; image recognition; image segmentation; speech recognition; visual databases; audiovisual speech recognition; character database; gray-scale pixels; multiscale nonlinear image decomposition; scale histogram; sieve; talker´s face; talker´s mouth; visual information; window; Audio databases; Data mining; Feature extraction; Gray-scale; Histograms; Image databases; Image decomposition; Mouth; Speech recognition; Visual databases;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location :

Philadelphia, PA

Print_ISBN :

0-7803-3555-4

Type :

conf

DOI :

10.1109/ICSLP.1996.607019

Filename :

607019

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2245781