Dynamic Dependency Tests for Audio-Visual Speaker Association

Author

Siracusa, M.R. ; Fisher, John W.

Author_Institution

Comput. Sci. & Artificial Intelligence Lab., MIT, MA, USA

Volume

2

fYear

2007

fDate

15-20 April 2007

Abstract

We formulate the problem of audio-visual speaker association as a dynamic dependency test. That is, given an audio stream and multiple video streams, we wish to determine their dependency structure as it evolves over time. To this end, we propose the use of a hidden factorization Markov model in which the hidden state encodes a finite number of possible dependency structures. Each dependency structure has an explicit semantic meaning, namely "who is speaking". This model takes advantage of both structural and parametric changes associated with changes in speaker. This is contrasted with standard sliding window based dependence analysis. Using this model we obtain state-of-the-art performance on an audio-visual association task without benefit of training data.

Keywords

Markov processes; audio signal processing; speaker recognition; video signal processing; audio stream; audio-visual speaker association; dynamic dependency tests; hidden factorization Markov model; multiple video streams; Artificial intelligence; Bayesian methods; Computer science; Context modeling; Hidden Markov models; Layout; Random variables; Streaming media; Testing; Training data; Pattern clustering methods;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on

Conference_Location

Honolulu, HI

ISSN

1520-6149

Print_ISBN

1-4244-0727-3

Type

conf

DOI

10.1109/ICASSP.2007.366271

Filename

4217444