مرکز منطقه ای اطلاع رساني علوم و فناوري - Speaker variability in speech based emotion models

DocumentCode :

1691155

Title :

Speaker variability in speech based emotion models - Analysis and normalisation

Author :

Sethu, Vidhyasaharan ; Epps, Julien ; Ambikairajah, E.

Author_Institution :

Sch. of Electr. Eng. & Telecommun., Univ. of New South Wales, Sydney, NSW, Australia

fYear :

2013

Firstpage :

7522

Lastpage :

7526

Abstract :

All features commonly utilised in speech based emotion classification systems capture both emotion-specific information and speaker-specific information. This paper proposes a novel method to gauge the effect of speaker-specific information on emotion modelling based on two measures: a Monte Carlo approximation to KL divergence and an estimate of feature variability based on diagonal covariance matrices. In addition, a novel speaker normalisation technique based on joint factor analysis is also proposed. This method is analogous to channel compensation in speaker verification systems, with one significant extension. The model domain compensation is mapped back to frame-level features, allowing for use in a wider range of emotion classification frameworks and in conjuncture with other normalisation techniques. Preliminary evaluations on the IEMOCAP database suggests that the proposed technique improves the performance of GMM based classification systems based on widely employed features such as pitch, MFCCs and deltas.

Keywords :

Monte Carlo methods; covariance matrices; emotion recognition; speech recognition; IEMOCAP database; KL divergence; Monte Carlo approximation; channel compensation; diagonal covariance matrix; emotion specific information; feature variability estimation; joint factor analysis; speaker normalisation technique; speaker specific information; speaker variability; speaker verification system; speech based emotion classification; speech based emotion model; Data models; Feature extraction; Mel frequency cepstral coefficient; Speech; Speech processing; Speech recognition; Support vector machine classification; KL divergence; emotion classification; joint factor analysis; speaker normalisation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6639125

Filename :

6639125

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1691155