مرکز منطقه ای اطلاع رساني علوم و فناوري - Multimodal speaker detection using error feedback dynamic Bayesian networks

DocumentCode :

3782934

Title :

Multimodal speaker detection using error feedback dynamic Bayesian networks

Author :

V. Pavlovic;A. Garg;J.M. Rehg;T.S. Huang

Author_Institution :

Res. Lab., Compaq Comput. Corp., Cambridge, MA, USA

Volume :

fYear :

2000

Firstpage :

Abstract :

Design and development of novel human-computer interfaces poses a challenging problem: actions and intentions of users have to be inferred from sequences of noisy and ambiguous multi-sensory data such as video and sound. Temporal fusion of multiple sensors has been efficiently formulated using dynamic Bayesian networks (DBNs) which allows the power of statistical inference and learning to be combined with contextual knowledge of the problem. Unfortunately simple learning methods can cause such appealing models to fail when the data exhibits complex behavior. We formulate a learning framework for DBNs based on error-feedback and statistical boosting theory. We apply this framework to the problem of audio/visual speaker detection in an interactive kiosk environment using "off-the-shelf" visual and audio sensors (face, skin, texture, mouth motion, and silence detectors). Detection results obtained in this setup demonstrate superiority of our learning framework over that of the classical ML learning in DBNs.

Keywords :

"Feedback","Bayesian methods","Face detection","Motion detection","Loudspeakers","Acoustic noise","Sensor fusion","Acoustic sensors","Learning systems","Boosting"

Publisher :

ieee

Conference_Titel :

Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on

ISSN :

1063-6919

Print_ISBN :

0-7695-0662-3

Type :

conf

DOI :

10.1109/CVPR.2000.854730

Filename :

854730

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3782934