• DocumentCode
    2289481
  • Title

    Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice

  • Author

    Metallinou, Angeliki ; Lee, Sungbok ; Narayanan, Shrikanth

  • Author_Institution
    Sch. of Electr. Eng., Univ. of Southern California, Los Angeles, CA
  • fYear
    2008
  • fDate
    15-17 Dec. 2008
  • Firstpage
    250
  • Lastpage
    257
  • Abstract
    Emotion expression associated with human communication is known to be a multimodal process. In this work, we investigate the way that emotional information is conveyed by facial and vocal modalities, and how these modalities can be effectively combined to achieve improved emotion recognition accuracy. In particular, the behaviors of different facial regions are studied in detail. We analyze an emotion database recorded from ten speakers (five female, five male), which contains speech and facial marker data. Each individual modality is modeled by Gaussian mixture models (GMMs). Multiple modalities are combined using two different methods: a Bayesian classifier weighting scheme and support vector machines that use post classification accuracies as features. Individual modality recognition performances indicate that anger and sadness have comparable accuracies for facial and vocal modalities, while happiness seems to be more accurately transmitted by facial expressions than voice. The neutral state has the lowest performance, possibly due to the vague definition of neutrality. Cheek regions achieve better emotion recognition accuracy compared to other facial regions. Moreover, classifier combination leads to significantly higher performance, which confirms that training detailed single modality classifiers and combining them at a later stage is an effective approach.
  • Keywords
    Bayes methods; Gaussian processes; audio-visual systems; emotion recognition; face recognition; speech recognition; support vector machines; Bayesian classifier weighting scheme; Gaussian mixture models; audio-visual emotion recognition; facial modalities; human communication; support vector machines; vocal modalities; Bayesian methods; Data analysis; Emotion recognition; Face recognition; Humans; Spatial databases; Speech analysis; Speech recognition; Support vector machine classification; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia, 2008. ISM 2008. Tenth IEEE International Symposium on
  • Conference_Location
    Berkeley, CA
  • Print_ISBN
    978-0-7695-3454-1
  • Electronic_ISBN
    978-0-7695-3454-1
  • Type

    conf

  • DOI
    10.1109/ISM.2008.40
  • Filename
    4741177