• DocumentCode
    2159361
  • Title

    Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors

  • Author

    Li, Ming ; Narayanan, Shrikanth

  • Author_Institution
    Signal Anal. & Interpretation Lab., Univ. of Southern California, Los Angeles, CA, USA
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    1481
  • Lastpage
    1484
  • Abstract
    It has been previously demonstrated that systems based on block wise local features and Gaussian mixture models (GMM) are suitable for video based talking face verification due to the best trade-off in terms of complexity, robustness and performance. In this paper, we propose two methods to enhance the robustness and performance of the GMM-ZTnorm baseline system. First, joint factor analysis is performed to compensate the session variabilities due to different recording devices, lighting conditions, facial expressions, etc. Second, the difference between the universal background model (UBM) and the maximum a posteriori (MAP) adapted model is mapped into the GMM mean shifted supervector whose over-complete dictionary becomes more incoherent. Then, for verification purpose, the sparse representation computed by l1-minimization with quadratic constraints is employed to model these GMM mean shifted supervectors. Experimental results show that the proposed system achieved 8.4% (group 1) and 10.5% (group 2) equal error rate on the Banca talking face video database following the P protocol and outperformed the GMM-ZTnorm baseline by yielding more than 20% relative error reduction.
  • Keywords
    Gaussian processes; face recognition; image representation; image sequences; maximum likelihood estimation; protocols; quadratic programming; speaker recognition; video databases; GMM; GMM-ZTnorm baseline system; Gaussian mixture models; P protocol; face video database; joint factor analysis; maximum a posteriori; mean shifted super vectors; quadratic constraints; sparse representation; taking face video verification; universal background model; Adaptation models; Dictionaries; Face; Protocols; Robustness; Support vector machines; Training; GMM supervector; face video recognition; joint factor analysis; sparse representation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5946773
  • Filename
    5946773