Title :
Model-based parametric features for emotion recognition from speech
Author :
Ananthakrishnan, Sankaranarayanan ; Vembu, Aravind Namandi ; Prasad, Rohit
Author_Institution :
Speech, Language & Multimedia Technol., Raytheon BBN Technol., Cambridge, MA, USA
Abstract :
Automatic emotion recognition from speech is desirable in many applications relying on spoken language processing. Telephone-based customer service systems, psychological healthcare initiatives, and virtual training modules are examples of real-world applications that would significantly benefit from such capability. Traditional utterance-level emotion recognition relies on a global feature set obtained by computing various statistics from raw segmental and supra-segmental measurements, including fundamental frequency (F0), energy, and MFCCs. In this paper, we propose a novel, model-based parametric feature set that better discriminates between the competing emotion classes. Our approach relaxes modeling assumptions associated with using global statistics (e.g. mean, standard deviation, etc.) of traditional segment-level features for classification, and results in significant improvements over the state-of-the-art in 7-way emotion classification accuracy on the standard, freely-available Berlin Emotional Speech Corpus. These improvements are consistent even in a reduced feature space obtained by Fisher´s Multiple Linear Discriminant Analysis, demonstrating the signficantly higher discriminative power of the proposed feature set.
Keywords :
emotion recognition; natural language processing; signal classification; speech recognition; statistical analysis; Berlin emotional speech corpus; MFCC; automatic emotion recognition; emotion classes; emotion classification accuracy; fundamental frequency; global statistics; model-based parametric feature set; model-based parametric features; multiple linear discriminant analysis; psychological healthcare initiatives; raw segmental measurement; real-world application; segment-level features; spoken language processing; supra-segmental measurement; telephone-based customer service system; utterance-level emotion recognition; virtual training module; Accuracy; Emotion recognition; Feature extraction; Hidden Markov models; Mel frequency cepstral coefficient; Speech; Vectors;
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
Conference_Location :
Waikoloa, HI
Print_ISBN :
978-1-4673-0365-1
Electronic_ISBN :
978-1-4673-0366-8
DOI :
10.1109/ASRU.2011.6163987