مرکز منطقه ای اطلاع رساني علوم و فناوري - Cross-lingual speech emotion recognition system based on a three-layer model for human perception

DocumentCode :

661276

Title :

Cross-lingual speech emotion recognition system based on a three-layer model for human perception

Author :

Elbarougy, Reda ; Akagi, Masato

Author_Institution :

Japan Adv. Inst. of Sci. & Technol. (JAIST), Nomi, Japan

fYear :

2013

fDate :

Oct. 29 2013-Nov. 1 2013

Firstpage :

Lastpage :

Abstract :

The purpose of this study is to investigate whether emotion dimensions valence, activation, and dominance can be estimated cross-lingually. Most of the previous studies for automatic speech emotion recognition were based on detecting the emotional state working on mono-language. However, in order to develop a generalized emotion recognition system, the performance of these systems must be analyzed in mono-language as well as cross-language. The ultimate goal of this study is to build a bilingual emotion recognition system that has the ability to estimate emotion dimensions from one language using a system trained using another language. In this study, we first propose a novel acoustic feature selection method based on a human perception model. The proposed model consists of three layers: emotion dimensions in the top layer, semantic primitives in the middle layer, and acoustic features in the bottom layer. The experimental results reveal that the proposed method is effective for selecting acoustic features representing emotion dimensions, working with two different databases, one in Japanese and the other in German. Finally, the common acoustic features between the two databases are used as the input to the cross-lingual emotion recognition system. Moreover, the proposed cross-lingual system based on the three-layer model performs just as well as the two separate mono-lingual systems for estimating emotion dimensions values.

Keywords :

emotion recognition; feature selection; natural languages; speech recognition; acoustic feature selection method; automatic speech emotion recognition; bilingual emotion recognition system; cross-lingual speech emotion recognition system; emotion dimension estimation; emotion dimension valence; emotional state detection; generalized emotion recognition system; human perception model; middle layer; mono-language; mono-lingual systems; semantic primitives; three-layer model; top layer; Acoustics; Correlation; Databases; Emotion recognition; Feature extraction; Semantics; Speech;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific

Conference_Location :

Kaohsiung

Type :

conf

DOI :

10.1109/APSIPA.2013.6694137

Filename :

6694137

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=661276