Supervised domain adaptation for emotion recognition from speech

Author

Abdelwahab, Mohammed ; Busso, Carlos

Author_Institution

Dept. of Electr. Eng., Univ. of Texas at Dallas, Richardson, TX, USA

fYear

2015

fDate

19-24 April 2015

Firstpage

5058

Lastpage

5062

Abstract

One of the main barriers in the deployment of speech emotion recognition systems in real applications is the lack of generalization of the emotion classifiers. The recognition performance achieved in controlled recordings drops when the models are tested with different speakers, channels, environments and domain conditions. This paper explores supervised model adaptation, which can improve the performance of systems evaluated with mismatched training and testing conditions. We address the following key questions in the context of supervised adaptation for speech emotion recognition: (a) how much labeled data is needed for adaptation to achieve good performance? (b) how important is speaker diversity in the labeled set? (c) can spontaneous acted data provide similar performance than naturalistic non-acted recordings? and (d) what is the best approach to adapt the models (domain adaptation versus incremental/online training)? We address these problems by using a multi-corpus framework where the models are trained and tested with different databases. The results indicate that even small portion of data used for adaptation can significantly improve the performance. Increasing the speaker diversity in the labeled data used for adaptation does not provide significant gain in performance. Also, we observe similar performance when the classifiers are trained with naturalistic non-acted data and spontaneous acted data.

Keywords

emotion recognition; pattern classification; speaker recognition; emotion classifier generalization; mismatched testing condition; mismatched training condition; multicorpus framework; speaker diversity; speech emotion recognition system; supervised domain adaptation; Adaptation models; Databases; Emotion recognition; Speech; Speech recognition; Support vector machines; Training; emotion recognition; supervised domain adaptation;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location

South Brisbane, QLD

Type

conf

DOI

10.1109/ICASSP.2015.7178934

Filename

7178934