• DocumentCode
    134264
  • Title

    Acoustic emotion recognition using deep neural network

  • Author

    Jianwei Niu ; Yanmin Qian ; Kai Yu

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
  • fYear
    2014
  • fDate
    12-14 Sept. 2014
  • Firstpage
    128
  • Lastpage
    132
  • Abstract
    Traditionally acoustic emotion recognition system has been using Gaussian Mixture Models (GMMs) for classification. However, the Gaussian Mixture Models do not make good use of multiple frames of input data and can not exploit the high-dimensional dependencies of features efficiently, thus it´s hard to improve the recognition accuracy for achieving a better result. Deep neural networks (DNNs) are artificial neural networks having more than one hidden layer, which are first pretrained layer by layer and then fine-tuned using back propagation algorithm. The well-trained deep neural networks are capable of modeling complex and non-linear features of input training data and can better predict the probability distribution over classification labels. In this paper, we used DNNs to replace GMMs in the recognition system architecture and conducted a series of experiments using neural networks that involved deep learning. Six discrete emotional states are classified based on these two kinds of classifiers. Our work focused on the performance of DNNs and experiments showed that the best recognition rate achieved by DNN-based system increased by 8.2 percentage points compared with baselines GMMs.
  • Keywords
    Gaussian processes; acoustic signal processing; backpropagation; emotion recognition; mixture models; neural nets; pattern classification; statistical distributions; DNN-based system; GMM; Gaussian mixture models; acoustic emotion recognition system; artificial neural networks; back propagation algorithm; classification label; deep learning; deep neural networks; discrete emotional states; nonlinear features; probability distribution; recognition accuracy; recognition rate; recognition system architecture; Acoustics; Emotion recognition; Feature extraction; Hidden Markov models; Neural networks; Speech; Speech recognition; deep neural networks; emotion recognition; gaussian mixture models; restricted Boltzmann machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
  • Conference_Location
    Singapore
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2014.6936657
  • Filename
    6936657