مرکز منطقه ای اطلاع رساني علوم و فناوري - Speech emotion recognition system based on a dimensional approach using a three-layered model

DocumentCode :

590619

Title :

Speech emotion recognition system based on a dimensional approach using a three-layered model

Author :

Elbarougy, Reda ; Akagi, Masato

Author_Institution :

Japan Adv. Inst. of Sci. & Technol. (JAIST), Nomi, Japan

fYear :

2012

fDate :

3-6 Dec. 2012

Firstpage :

Lastpage :

Abstract :

This paper proposes a three-layer model for estimating the expressed emotions in a speech signal based on a dimensional approach. Most of the previous studies using the dimensional approach mainly focused on the direct relationship between acoustic features and emotion dimensions (valence, activation, and dominance). However, the acoustic features that correlate to valence dimension are less numerous, less strong, and the valence dimension has being particularly difficult to be predicted. The ultimate goal of this study is to improve the dimensional approach in order to precisely predict the valence dimension. The proposed model consists of three layers: acoustic features, semantic primitives, and emotion dimensions. We aimed to construct a three-layer model in imitation of the process of how human perceive and recognize emotions. In this study, we first investigated the correlations between the elements of the two-layered model and elements of the three-layered model. In addition, we compared the two models by applying a fuzzy inference system (FIS) to estimate emotion dimensions. In our model FIS was used to estimate semantic primitives from acoustic features, then to estimate emotion dimensions from the estimated semantic primitives. The experimental results show that the proposed three-layered model outperforms the traditional two-layered model.

Keywords :

emotion recognition; fuzzy reasoning; speech recognition; FIS; acoustic features; activation; dimensional approach; dominance; emotion dimensions; emotions recognition; expressed emotions; fuzzy inference system; human perception; semantic primitives; speech emotion recognition system; speech signal; three-layer model; three-layered model; two-layered model; valence dimension; Acoustics; Correlation; Emotion recognition; Feature extraction; Hidden Markov models; Semantics; Speech;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific

Conference_Location :

Hollywood, CA

Print_ISBN :

978-1-4673-4863-8

Type :

conf

Filename :

6411766

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=590619