Title :
Cluster adaptive training of average voice models
Author :
Wan, Vincent ; Latorre, Javier ; Yanagisawa, Kei ; Gales, Mark ; Stylianou, Yannis
Author_Institution :
Toshiba Res. Eur. Ltd., Cambridge, UK
Abstract :
Hidden Markov model based text-to-speech systems may be adapted so that the synthesised speech sounds like a particular person. The average voice model (AVM) approach uses linear transforms to achieve this while multiple decision tree cluster adaptive training (CAT) represents different speakers as points in a low dimensional space. This paper describes a novel combination of CAT and AVM for modelling speakers. CAT yields higher quality synthetic speech than AVMs but AVMs model the target speaker better. The resulting combination may be interpreted as a more powerful version of the AVM. Results show that the combination achieves better target speaker similarity when compared with both AVM and CAT while the speech quality is in-between AVM and CAT.
Keywords :
decision trees; maximum likelihood estimation; pattern clustering; speech synthesis; transforms; AVM approach; CAT; average voice model approach; hidden Markov model based text-to-speech systems; linear transforms; low dimensional space; multiple decision tree cluster adaptive training; speaker modelling; speech quality; synthesised speech; target speaker similarity; Adaptation models; Decision trees; Hidden Markov models; Speech; Training; Transforms; Vectors; Speech synthesis; average voice model; cluster adaptive training; voice cloning;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6853602