An experimental study of speech emotion recognition based on deep convolutional neural networks

Author

W. Q. Zheng;J. S. Yu;Y. X. Zou

Author_Institution

ADSPLAB/ELIP, School of Electronic Computer Engineering, Peking University, Shenzhen, China

fYear

2015

Firstpage

827

Lastpage

831

Abstract

Speech emotion recognition (SER) is a challenging task since it is unclear what kind of features are able to reflect the characteristics of human emotion from speech. However, traditional feature extractions perform inconsistently for different emotion recognition tasks. Obviously, different spectrogram provides information reflecting difference emotion. This paper proposes a systematical approach to implement an effectively emotion recognition system based on deep convolution neural networks (DCNNs) using labeled training audio data. Specifically, the log-spectrogram is computed and the principle component analysis (PCA) technique is used to reduce the dimensionality and suppress the interferences. Then the PCA whitened spectrogram is split into non-overlapping segments. The DCNN is constructed to learn the representation of the emotion from the segments with labeled training speech data. Our preliminary experiments show the proposed emotion recognition system based on DCNNs (containing 2 convolution and 2 pooling layers) achieves about 40% classification accuracy. Moreover, it also outperforms the SVM based classification using the hand-crafted acoustic features.

Keywords

"Speech","Speech recognition","Emotion recognition","Spectrogram","Feature extraction","Principal component analysis","Convolution"

Publisher

ieee

Conference_Titel

Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on

Electronic_ISBN

2156-8111

Type

conf

DOI

10.1109/ACII.2015.7344669

Filename

7344669