Title :
A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages
Author :
Samantaray, Amiya Kumar ; Mahapatra, Kamalakanta ; Kabi, Bibek ; Routray, Aurobinda
Author_Institution :
Phoenix Robotix, Rourkela, India
Abstract :
Speech emotion recognition is one of the recent challenges in speech processing and Human Computer Interaction (HCI) in order to address various operational needs for the real world applications. Besides human facial expressions, speech has been proven to be one of the most precious modalities for automatic recognition of human emotions. Speech is a spontaneous medium of perceiving emotions which provides in-depth information related to different cognitive states of a human being. In this context, a novel approach is being introduces using a combination of prosody features (i.e. pitch, energy, Zero crossing rate), quality features (i.e. Formant Frequencies, Spectral features etc.), derived features (i.e. Mel-Frequency Cepstral Coefficient (MFCC), Linear Predictive Coding Coefficients (LPCC)) and dynamic feature (Mel-Energy spectrum dynamic Coefficients (MEDC)) for robust automatic recognition of speaker´s state of emotion. Multilevel SVM classifier is used for identification of seven discrete emotional states namely anger, disgust, fear, happy, neutral, sad and surprise in `Five native Assamese Languages´. The overall results of the conducted experiments revealed that the approach of using the combination of features achieved an average accuracy rate of 82.26% for speaker independent cases.
Keywords :
emotion recognition; human computer interaction; natural language processing; prediction theory; signal classification; speech coding; speech recognition; support vector machines; Assamese languages; HCI; LPCC; MEDC; North-Eastern languages; anger emotion state; automatic human emotion recognition; discrete emotional states; disgust emotion state; dynamic feature; fear emotion state; happy emotion state; human computer interaction; human facial expressions; linear predictive coding coefficients; mel-energy spectrum dynamic coefficients; multilevel SVM classifier; neutral emotion state; prosody features; quality features; sad emotion state; speech emotion recognition; speech processing; surprise emotion state; Emotion recognition; Feature extraction; Mel frequency cepstral coefficient; Speech; Speech processing; Speech recognition; Support vector machines; Linear Predictive Coding Coefficients; Mel Frequency Cepstral Coefficients; Prosody features; Quality features; Speech Emotion Recognition; Support Vector Machine;
Conference_Titel :
Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on
Conference_Location :
Kolkata
DOI :
10.1109/ReTIS.2015.7232907