Acoustic emotion recognition using deep neural network

Author

Jianwei Niu ; Yanmin Qian ; Kai Yu

Author_Institution

Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China

fYear

2014

fDate

12-14 Sept. 2014

Firstpage

128

Lastpage

132

Abstract

Traditionally acoustic emotion recognition system has been using Gaussian Mixture Models (GMMs) for classification. However, the Gaussian Mixture Models do not make good use of multiple frames of input data and can not exploit the high-dimensional dependencies of features efficiently, thus it´s hard to improve the recognition accuracy for achieving a better result. Deep neural networks (DNNs) are artificial neural networks having more than one hidden layer, which are first pretrained layer by layer and then fine-tuned using back propagation algorithm. The well-trained deep neural networks are capable of modeling complex and non-linear features of input training data and can better predict the probability distribution over classification labels. In this paper, we used DNNs to replace GMMs in the recognition system architecture and conducted a series of experiments using neural networks that involved deep learning. Six discrete emotional states are classified based on these two kinds of classifiers. Our work focused on the performance of DNNs and experiments showed that the best recognition rate achieved by DNN-based system increased by 8.2 percentage points compared with baselines GMMs.

Keywords

Gaussian processes; acoustic signal processing; backpropagation; emotion recognition; mixture models; neural nets; pattern classification; statistical distributions; DNN-based system; GMM; Gaussian mixture models; acoustic emotion recognition system; artificial neural networks; back propagation algorithm; classification label; deep learning; deep neural networks; discrete emotional states; nonlinear features; probability distribution; recognition accuracy; recognition rate; recognition system architecture; Acoustics; Emotion recognition; Feature extraction; Hidden Markov models; Neural networks; Speech; Speech recognition; deep neural networks; emotion recognition; gaussian mixture models; restricted Boltzmann machine;

fLanguage

English

Publisher

ieee

Conference_Titel

Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on

Conference_Location

Singapore

Type

conf

DOI

10.1109/ISCSLP.2014.6936657

Filename

6936657