Automatic Emotion Variation Detection in continuous speech

Author

Yuchao Fan ; Mingxing Xu ; Zhiyong Wu ; Lianhong Cai

Author_Institution

Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China

fYear

2014

fDate

9-12 Dec. 2014

Firstpage

1

Lastpage

5

Abstract

Though emotion speech recognition has gained increasing interest in the field of Human Computer Interaction, it is still a challenge to automatically determine the emotion state type and the boundaries of each emotionally salient segment in continuous speech, which is named as Automatic Emotion Variation Detection (AEVD). In this task, the input utterances are not pre-segmented and may contain emotion variations. This paper proposes a Multi-timescaled Sliding Window based AEVD (MSW-AEVD). Firstly, a sliding window with fixed-length is employed to segment continuous speech for classic emotion recognition. An emotion type is assigned to each window-shift according to the recognition results of all the sliding windows containing that window-shift. Then this basic procedure is extended to multi-timescaled sliding window, in which several different features are utilized for different scales. Finally, a post-processing is employed to refine the final outputs. In this work, we focus on anger-neutral and happiness-neutral cases, which are mostly dominant in recent studies of AEVD. Performance evaluation is carried out across two databases, including German database EMO-DB and Chinese database TH1309-DB. Experimental results show that the proposed method outperforms HMM-based baseline significantly.

Keywords

audio databases; emotion recognition; hidden Markov models; human computer interaction; signal detection; speech recognition; Chinese database TH1309-DB; German database EMO-DB; HMM-based baseline; MSW-AEVD; anger-neutral case; automatic emotion variation detection; continuous speech segmentation; emotion speech recognition; emotion state type; emotionally salient segment; happiness-neutral case; human computer interaction; multitimescaled sliding window based AEVD; window-shift; Databases; Emotion recognition; Feature extraction; Hidden Markov models; Mel frequency cepstral coefficient; Speech; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)

Conference_Location

Siem Reap

Type

conf

DOI

10.1109/APSIPA.2014.7041592

Filename

7041592