CO-LDA: A Semi-supervised Approach to Audio-Visual Person Recognition

Author

Zhao, Xuran ; Evans, Nicholas ; Dugelay, Jean-Luc

Author_Institution

Dept. of Multimedia Commun., EURECOM, Sophia-Antipolis, France

fYear

2012

fDate

9-13 July 2012

Firstpage

356

Lastpage

361

Abstract

Client models used in Automatic Speaker Recognition (ASR) and Automatic Face Recognition (AFR) are usually trained with labelled data acquired in a small number of menthol sessions. The amount of training data is rarely sufficient to reliably represent the variation which occurs later during testing. Larger quantities of client-specific training data can always be obtained, but manual collection and labelling is often cost-prohibitive. Co-training, a paradigm of semi-supervised machine learning, which can exploit unlabelled data to enhance weakly learned client models. In this paper, we propose a co-LDA algorithm which uses both labelled and unlabelled data to capture greater intersession variation and to learn discriminative subspaces in which test examples can be more accurately classified. The proposed algorithm is naturally suited to audio-visual person recognition because vocal and visual biometric features intrinsically satisfy the assumptions of feature sufficiency and independency which guarantee the effectiveness of co-training. When tested on the MOBIO database, the proposed co-training system raises a baseline identification rate from 71% to 99% while in a verification task the Equal Error Rate (EER) is reduced from 18% to about 1%. To our knowledge, this is the first successful application of co-training in audio-visual biometric systems.

Keywords

audio-visual systems; biometrics (access control); data acquisition; face recognition; feature extraction; image representation; learning (artificial intelligence); speaker recognition; visual databases; MOBIO database; audio-visual biometric system; audio-visual person recognition; automatic face recognition; automatic speaker recognition; baseline identification rate; client model; co-LDA algorithm; co-training system; data acquisition; feature independency; feature sufficiency; labelled data; semi-supervised machine learning; unlabelled data; variational data representation; visual biometric feature; vocal biometric feature; Adaptation models; Data models; Face; Feature extraction; Training data; Vectors; Videos; Biometrics; audio-visual person recognition; co-training; face recognition; semi-supervised learning; speaker recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Multimedia and Expo (ICME), 2012 IEEE International Conference on

Conference_Location

Melbourne, VIC

ISSN

1945-7871

Print_ISBN

978-1-4673-1659-0

Type

conf

DOI

10.1109/ICME.2012.14

Filename

6298423