Title :
Emotional adaptive training for speaker verification
Author :
Fanhu Bie ; Dong Wang ; Zheng, Thomas Fang ; Tejedor, Javier ; Ruxin Chen
Author_Institution :
Center for Speech & Language Technol., Tsinghua Univ., Beijing, China
fDate :
Oct. 29 2013-Nov. 1 2013
Abstract :
Speaker verification suffers from significant performance degradation with emotion variation. In a previous study, we have demonstrated that an adaptation approach based on MLLR/CMLLR can provide a significant performance improvement for verification on emotional speech. This paper follows this direction and presents an emotional adaptive training (EAT) approach. This approach iteratively estimates the emotion-dependent CMLLR transformations and re-trains the speaker models with the transformed speech, which therefore can make use of emotional enrollment speech to train a stronger speaker model. This is similar to the speaker adaptive training (SAT) in speech recognition. The experiments are conducted on an emotional speech database which involves speech recordings of 30 speakers in 5 emotions. The results demonstrate that the EAT approach provides significant performance improvements over the baseline system where the neutral enrollment data are used to train the speaker models and the emotional test utterances are verified directly. The EAT also outperforms another two emotionadaptation approaches in a significant way: (1) the CMLLR-based approach where the speaker models are trained with the neutral enrollment speech and the emotional test utterances are transformed by CMLLR in verification; (2) the MAP-based approach where the emotional enrollment data are used to train emotion-dependent speaker models and the emotional utterances are verified based on the emotion-matched models.
Keywords :
emotion recognition; speaker recognition; EAT approach; MAP-based approach; SAT; emotion variation; emotion-dependent CMLLR transformation estimation; emotion-dependent speaker model training; emotion-matched models; emotional adaptive training; emotional enrollment speech data; emotional speech database; emotional test utterances; neutral enrollment speech data; performance improvement; speaker adaptive training; speaker model retraining; speaker verification; speech recognition; speech recordings; Adaptation models; Computers; Data models; Hidden Markov models; Spectrogram; Speech; Training;
Conference_Titel :
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific
Conference_Location :
Kaohsiung
DOI :
10.1109/APSIPA.2013.6694123