DocumentCode :
3703340
Title :
From simulated speech to natural speech, what are the robust features for emotion recognition?
Author :
Ya Li;Linlin Chao;Yazhu Liu;Wei Bao;Jianhua Tao
Author_Institution :
National Laboratory of Pattern Recognition, (NLPR), Institute of Automation, CAS, Beijing, China
fYear :
2015
Firstpage :
368
Lastpage :
373
Abstract :
The earliest research on emotion recognition starts with simulated/acted stereotypical emotional corpus, and then extends to elicited corpus. Recently, the demanding for real application forces the research shift to natural and spontaneous corpus. Previous research shows that accuracies of emotion recognition are gradual decline from simulated speech, to elicited and totally natural speech. This paper aims to investigate the effects of the common utilized spectral, prosody and voice quality features in emotion recognition with the three types of corpus, and finds out the robust feature for emotion recognition with natural speech. Emotion recognition by several common machine learning methods are carried out and thoroughly compared. Three feature selection methods are performed to find the robust features. The results on six common used corpora confirm that recognition accuracies decrease when the corpus changing from simulated to natural corpus. In addition, prosody and voice quality features are robust for emotion recognition on simulated corpus, while spectral feature is robust in elicited and natural corpus.
Keywords :
"Emotion recognition","Speech","Robustness","Databases","Speech recognition","Feature extraction","Support vector machines"
Publisher :
ieee
Conference_Titel :
Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on
Electronic_ISBN :
2156-8111
Type :
conf
DOI :
10.1109/ACII.2015.7344597
Filename :
7344597
Link To Document :
بازگشت