DocumentCode :
661276
Title :
Cross-lingual speech emotion recognition system based on a three-layer model for human perception
Author :
Elbarougy, Reda ; Akagi, Masato
Author_Institution :
Japan Adv. Inst. of Sci. & Technol. (JAIST), Nomi, Japan
fYear :
2013
fDate :
Oct. 29 2013-Nov. 1 2013
Firstpage :
1
Lastpage :
10
Abstract :
The purpose of this study is to investigate whether emotion dimensions valence, activation, and dominance can be estimated cross-lingually. Most of the previous studies for automatic speech emotion recognition were based on detecting the emotional state working on mono-language. However, in order to develop a generalized emotion recognition system, the performance of these systems must be analyzed in mono-language as well as cross-language. The ultimate goal of this study is to build a bilingual emotion recognition system that has the ability to estimate emotion dimensions from one language using a system trained using another language. In this study, we first propose a novel acoustic feature selection method based on a human perception model. The proposed model consists of three layers: emotion dimensions in the top layer, semantic primitives in the middle layer, and acoustic features in the bottom layer. The experimental results reveal that the proposed method is effective for selecting acoustic features representing emotion dimensions, working with two different databases, one in Japanese and the other in German. Finally, the common acoustic features between the two databases are used as the input to the cross-lingual emotion recognition system. Moreover, the proposed cross-lingual system based on the three-layer model performs just as well as the two separate mono-lingual systems for estimating emotion dimensions values.
Keywords :
emotion recognition; feature selection; natural languages; speech recognition; acoustic feature selection method; automatic speech emotion recognition; bilingual emotion recognition system; cross-lingual speech emotion recognition system; emotion dimension estimation; emotion dimension valence; emotional state detection; generalized emotion recognition system; human perception model; middle layer; mono-language; mono-lingual systems; semantic primitives; three-layer model; top layer; Acoustics; Correlation; Databases; Emotion recognition; Feature extraction; Semantics; Speech;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific
Conference_Location :
Kaohsiung
Type :
conf
DOI :
10.1109/APSIPA.2013.6694137
Filename :
6694137
Link To Document :
بازگشت