• DocumentCode
    1995488
  • Title

    A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages

  • Author

    Samantaray, Amiya Kumar ; Mahapatra, Kamalakanta ; Kabi, Bibek ; Routray, Aurobinda

  • Author_Institution
    Phoenix Robotix, Rourkela, India
  • fYear
    2015
  • fDate
    9-11 July 2015
  • Firstpage
    372
  • Lastpage
    377
  • Abstract
    Speech emotion recognition is one of the recent challenges in speech processing and Human Computer Interaction (HCI) in order to address various operational needs for the real world applications. Besides human facial expressions, speech has been proven to be one of the most precious modalities for automatic recognition of human emotions. Speech is a spontaneous medium of perceiving emotions which provides in-depth information related to different cognitive states of a human being. In this context, a novel approach is being introduces using a combination of prosody features (i.e. pitch, energy, Zero crossing rate), quality features (i.e. Formant Frequencies, Spectral features etc.), derived features (i.e. Mel-Frequency Cepstral Coefficient (MFCC), Linear Predictive Coding Coefficients (LPCC)) and dynamic feature (Mel-Energy spectrum dynamic Coefficients (MEDC)) for robust automatic recognition of speaker´s state of emotion. Multilevel SVM classifier is used for identification of seven discrete emotional states namely anger, disgust, fear, happy, neutral, sad and surprise in `Five native Assamese Languages´. The overall results of the conducted experiments revealed that the approach of using the combination of features achieved an average accuracy rate of 82.26% for speaker independent cases.
  • Keywords
    emotion recognition; human computer interaction; natural language processing; prediction theory; signal classification; speech coding; speech recognition; support vector machines; Assamese languages; HCI; LPCC; MEDC; North-Eastern languages; anger emotion state; automatic human emotion recognition; discrete emotional states; disgust emotion state; dynamic feature; fear emotion state; happy emotion state; human computer interaction; human facial expressions; linear predictive coding coefficients; mel-energy spectrum dynamic coefficients; multilevel SVM classifier; neutral emotion state; prosody features; quality features; sad emotion state; speech emotion recognition; speech processing; surprise emotion state; Emotion recognition; Feature extraction; Mel frequency cepstral coefficient; Speech; Speech processing; Speech recognition; Support vector machines; Linear Predictive Coding Coefficients; Mel Frequency Cepstral Coefficients; Prosody features; Quality features; Speech Emotion Recognition; Support Vector Machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on
  • Conference_Location
    Kolkata
  • Type

    conf

  • DOI
    10.1109/ReTIS.2015.7232907
  • Filename
    7232907