Title :
Speech emotion recognition with i-vector feature and RNN model
Author :
Teng Zhang ; Ji Wu
Author_Institution :
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
Abstract :
Machine-based emotion recognition from speech has emerged as an important research area in recent years. However, most studies have been done on artificial data. The difficulty of the recognition task increases when we facing natural speech data such as real-world conversations from call centre. Along with that difficulty, there are some new properties which may be useful to the real-world recognition tasks. In this paper, we focus on the recognition task on real-world conversations. Traditional prosodic acoustic features and the novel i-vector features are introduced and compared to represent the speech signal more abstractly. We also propose a Recurrent Neural Network approach to map the features to emotion labels. With only prosodic acoustic features and SVM multi-clasifier, we obtain a f-measure of 38.3%. By adding the i-vector features and the RNN model, we achieve a better result of 48.9%.
Keywords :
acoustic signal processing; emotion recognition; feature extraction; recurrent neural nets; signal classification; speech recognition; statistical analysis; support vector machines; RNN model; SVM multiclassifier; call centre; emotion labels; f-measure; i-vector feature; machine-based emotion recognition; natural speech data; prosodic acoustic features; real-world conversations; real-world recognition tasks; recurrent neural network; speech emotion recognition; speech signal representation; Acoustics; Databases; Emotion recognition; Feature extraction; Recurrent neural networks; Speech; Speech recognition; Emotion recognition; Recurrent neural networks; Speech analysis;
Conference_Titel :
Signal and Information Processing (ChinaSIP), 2015 IEEE China Summit and International Conference on
Conference_Location :
Chengdu
DOI :
10.1109/ChinaSIP.2015.7230458