Title :
Audio-visual emotion recognition using an emotion space concept
Author :
Kanluan, Ittipan ; Grimm, Michael ; Kroschel, Kristian
Author_Institution :
Inst. fur Nachrichtentechnik, Univ. Karlsruhe (TH), Karlsruhe, Germany
Abstract :
In this paper, we present novel methods for estimating spontaneously expressed emotions using audio-visual information. Emotions are described with three continuous-valued emotion primitives, namely valence, activation, and dominance in a 3D emotion space. We used prosodic and spectral features to represent the audio characteristics of the emotional speech. For the extraction of visual features, the 2-dimensional Discrete Cosine Transform (2D-DCT) was applied to blocks of a predefined size in facial images. Support Vector Machines (SVM) are used in their application for regression (Support Vector Regression, SVR) to estimate these 3 emotion primitives. The result showed that the emotion primitives activation and dominance can be best estimated with acoustic features, whereas the estimation of valence yields the best result when visual features are used. Both monomodal emotion estimations were subsequently fused at a decision level by a weighted linear combination. The average estimation error of the fused result was 17.6% and 12.7% below the individual error of the acoustic and visual emotion recognition, respectively. The correlation between the emotion estimates and the manual reference was increased by 12.3% and 9.0%, respectively.
Keywords :
audio-visual systems; discrete cosine transforms; emotion recognition; feature extraction; regression analysis; support vector machines; 2D discrete cosine transform; 2D-DCT; 3D emotion space; SVM; SVR; acoustic feature extraction; audio characteristics representation; audio-visual emotion recognition; emotion primitive activation; emotion primitive dominance; emotion primitive estimation; estimation error; facial image; monomodal emotion estimation; prosodic feature; spectral feature; support vector machine; support vector regression; valence; visual feature extraction; Acoustics; Correlation; Emotion recognition; Estimation; Feature extraction; Speech; Visualization;
Conference_Titel :
Signal Processing Conference, 2008 16th European
Conference_Location :
Lausanne