• DocumentCode
    569149
  • Title

    CO-LDA: A Semi-supervised Approach to Audio-Visual Person Recognition

  • Author

    Zhao, Xuran ; Evans, Nicholas ; Dugelay, Jean-Luc

  • Author_Institution
    Dept. of Multimedia Commun., EURECOM, Sophia-Antipolis, France
  • fYear
    2012
  • fDate
    9-13 July 2012
  • Firstpage
    356
  • Lastpage
    361
  • Abstract
    Client models used in Automatic Speaker Recognition (ASR) and Automatic Face Recognition (AFR) are usually trained with labelled data acquired in a small number of menthol sessions. The amount of training data is rarely sufficient to reliably represent the variation which occurs later during testing. Larger quantities of client-specific training data can always be obtained, but manual collection and labelling is often cost-prohibitive. Co-training, a paradigm of semi-supervised machine learning, which can exploit unlabelled data to enhance weakly learned client models. In this paper, we propose a co-LDA algorithm which uses both labelled and unlabelled data to capture greater intersession variation and to learn discriminative subspaces in which test examples can be more accurately classified. The proposed algorithm is naturally suited to audio-visual person recognition because vocal and visual biometric features intrinsically satisfy the assumptions of feature sufficiency and independency which guarantee the effectiveness of co-training. When tested on the MOBIO database, the proposed co-training system raises a baseline identification rate from 71% to 99% while in a verification task the Equal Error Rate (EER) is reduced from 18% to about 1%. To our knowledge, this is the first successful application of co-training in audio-visual biometric systems.
  • Keywords
    audio-visual systems; biometrics (access control); data acquisition; face recognition; feature extraction; image representation; learning (artificial intelligence); speaker recognition; visual databases; MOBIO database; audio-visual biometric system; audio-visual person recognition; automatic face recognition; automatic speaker recognition; baseline identification rate; client model; co-LDA algorithm; co-training system; data acquisition; feature independency; feature sufficiency; labelled data; semi-supervised machine learning; unlabelled data; variational data representation; visual biometric feature; vocal biometric feature; Adaptation models; Data models; Face; Feature extraction; Training data; Vectors; Videos; Biometrics; audio-visual person recognition; co-training; face recognition; semi-supervised learning; speaker recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia and Expo (ICME), 2012 IEEE International Conference on
  • Conference_Location
    Melbourne, VIC
  • ISSN
    1945-7871
  • Print_ISBN
    978-1-4673-1659-0
  • Type

    conf

  • DOI
    10.1109/ICME.2012.14
  • Filename
    6298423