Data selection for acoustic emotion recognition: Analyzing and comparing utterance and sub-utterance selection strategies

Author

Duc Le;Emily Mower Provost

Author_Institution

Computer Science and Engineering, University of Michigan, Ann Arbor, MI, USA

fYear

2015

Firstpage

146

Lastpage

152

Abstract

Data selection is an important component of cross-corpus training and semi-supervised/active learning. However, its effect on acoustic emotion recognition is still not well understood. In this work, we perform an in-depth exploration of various data selection strategies for emotion classification from speech using classifier agreement as the selection metric. Our methods span both the traditional utterance as well as the less explored sub-utterance level. A median unweighted average recall of 70.68%, comparable to the winner of the 2009 INTERSPEECH Emotion Challenge, was achieved on the FAU Aibo 2-class problem using less than 50% of the training data. Our results indicate that sub-utterance selection leads to slightly faster convergence and significantly more stable learning. In addition, diversifying instances in terms of classifier agreement produces a faster learning rate, whereas selecting those near the median results in higher stability. We show that the selected data instances can be explained intuitively based on their acoustic properties and position within an utterance. Our work helps provide a deeper understanding of the strengths, weaknesses, and trade-offs of different data selection strategies for speech emotion recognition.

Keywords

"Hidden Markov models","Training","Emotion recognition","Speech","Training data","Measurement","Acoustics"

Publisher

ieee

Conference_Titel

Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on

Electronic_ISBN

2156-8111

Type

conf

DOI

10.1109/ACII.2015.7344564

Filename

7344564