DocumentCode
118041
Title
Automatic Emotion Variation Detection in continuous speech
Author
Yuchao Fan ; Mingxing Xu ; Zhiyong Wu ; Lianhong Cai
Author_Institution
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fYear
2014
fDate
9-12 Dec. 2014
Firstpage
1
Lastpage
5
Abstract
Though emotion speech recognition has gained increasing interest in the field of Human Computer Interaction, it is still a challenge to automatically determine the emotion state type and the boundaries of each emotionally salient segment in continuous speech, which is named as Automatic Emotion Variation Detection (AEVD). In this task, the input utterances are not pre-segmented and may contain emotion variations. This paper proposes a Multi-timescaled Sliding Window based AEVD (MSW-AEVD). Firstly, a sliding window with fixed-length is employed to segment continuous speech for classic emotion recognition. An emotion type is assigned to each window-shift according to the recognition results of all the sliding windows containing that window-shift. Then this basic procedure is extended to multi-timescaled sliding window, in which several different features are utilized for different scales. Finally, a post-processing is employed to refine the final outputs. In this work, we focus on anger-neutral and happiness-neutral cases, which are mostly dominant in recent studies of AEVD. Performance evaluation is carried out across two databases, including German database EMO-DB and Chinese database TH1309-DB. Experimental results show that the proposed method outperforms HMM-based baseline significantly.
Keywords
audio databases; emotion recognition; hidden Markov models; human computer interaction; signal detection; speech recognition; Chinese database TH1309-DB; German database EMO-DB; HMM-based baseline; MSW-AEVD; anger-neutral case; automatic emotion variation detection; continuous speech segmentation; emotion speech recognition; emotion state type; emotionally salient segment; happiness-neutral case; human computer interaction; multitimescaled sliding window based AEVD; window-shift; Databases; Emotion recognition; Feature extraction; Hidden Markov models; Mel frequency cepstral coefficient; Speech; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
Conference_Location
Siem Reap
Type
conf
DOI
10.1109/APSIPA.2014.7041592
Filename
7041592
Link To Document