مرکز منطقه ای اطلاع رساني علوم و فناوري - Recent advances in the automatic recognition of audiovisual speech

DocumentCode :

778626

Title :

Recent advances in the automatic recognition of audiovisual speech

Author :

Potamianos, Gerasimos ; Neti, Chalapathy ; Gravier, Guillaume ; Garg, Ashutosh ; Senior, Andrew W.

Author_Institution :

Human Language Technol. Dept., IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA

Volume :

Issue :

fYear :

2003

Firstpage :

1306

Lastpage :

1326

Abstract :

Visual speech information from the speaker´s mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability in the human computer interface. In this paper, we review the main components of audiovisual automatic speech recognition (ASR) and present novel contributions in two main areas: first, the visual front-end design, based on a cascade of linear image transforms of an appropriate video region of interest, and subsequently, audiovisual speech integration. On the latter topic, we discuss new work on feature and decision fusion combination, the modeling of audiovisual speech asynchrony, and incorporating modality reliability estimates to the bimodal recognition process. We also briefly touch upon the issue of audiovisual adaptation. We apply our algorithms to three multisubject bimodal databases, ranging from small- to large-vocabulary recognition tasks, recorded in both visually controlled and challenging environments. Our experiments demonstrate that the visual modality improves ASR over all conditions and data considered, though less so for visually challenging environments and large vocabulary tasks.

Keywords :

feature extraction; hidden Markov models; speech recognition; audiovisual speech integration; automatic audiovisual speech recognition; bimodal recognition process; hidden Markov models; human computer interface; modality reliability estimates; multimedia databases; noise robustness; stream reliability; video region of interest; visual feature extraction; visual front-end design; visual speech information; Automatic speech recognition; Computer interfaces; Humans; Mouth; Noise robustness; Spatial databases; Speech enhancement; Speech processing; Speech recognition; Usability;

fLanguage :

English

Journal_Title :

Proceedings of the IEEE

Publisher :

ieee

ISSN :

0018-9219

Type :

jour

DOI :

10.1109/JPROC.2003.817150

Filename :

1230212

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=778626