A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages

Author

Samantaray, Amiya Kumar ; Mahapatra, Kamalakanta ; Kabi, Bibek ; Routray, Aurobinda

Author_Institution

Phoenix Robotix, Rourkela, India

fYear

2015

fDate

9-11 July 2015

Firstpage

372

Lastpage

377

Abstract

Speech emotion recognition is one of the recent challenges in speech processing and Human Computer Interaction (HCI) in order to address various operational needs for the real world applications. Besides human facial expressions, speech has been proven to be one of the most precious modalities for automatic recognition of human emotions. Speech is a spontaneous medium of perceiving emotions which provides in-depth information related to different cognitive states of a human being. In this context, a novel approach is being introduces using a combination of prosody features (i.e. pitch, energy, Zero crossing rate), quality features (i.e. Formant Frequencies, Spectral features etc.), derived features (i.e. Mel-Frequency Cepstral Coefficient (MFCC), Linear Predictive Coding Coefficients (LPCC)) and dynamic feature (Mel-Energy spectrum dynamic Coefficients (MEDC)) for robust automatic recognition of speaker´s state of emotion. Multilevel SVM classifier is used for identification of seven discrete emotional states namely anger, disgust, fear, happy, neutral, sad and surprise in `Five native Assamese Languages´. The overall results of the conducted experiments revealed that the approach of using the combination of features achieved an average accuracy rate of 82.26% for speaker independent cases.

Keywords

emotion recognition; human computer interaction; natural language processing; prediction theory; signal classification; speech coding; speech recognition; support vector machines; Assamese languages; HCI; LPCC; MEDC; North-Eastern languages; anger emotion state; automatic human emotion recognition; discrete emotional states; disgust emotion state; dynamic feature; fear emotion state; happy emotion state; human computer interaction; human facial expressions; linear predictive coding coefficients; mel-energy spectrum dynamic coefficients; multilevel SVM classifier; neutral emotion state; prosody features; quality features; sad emotion state; speech emotion recognition; speech processing; surprise emotion state; Emotion recognition; Feature extraction; Mel frequency cepstral coefficient; Speech; Speech processing; Speech recognition; Support vector machines; Linear Predictive Coding Coefficients; Mel Frequency Cepstral Coefficients; Prosody features; Quality features; Speech Emotion Recognition; Support Vector Machine;

fLanguage

English

Publisher

ieee

Conference_Titel

Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on

Conference_Location

Kolkata

Type

conf

DOI

10.1109/ReTIS.2015.7232907

Filename

7232907

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=1995488