DocumentCode
177497
Title
Cluster adaptive training of average voice models
Author
Wan, Vincent ; Latorre, Javier ; Yanagisawa, Kei ; Gales, Mark ; Stylianou, Yannis
Author_Institution
Toshiba Res. Eur. Ltd., Cambridge, UK
fYear
2014
fDate
4-9 May 2014
Firstpage
280
Lastpage
284
Abstract
Hidden Markov model based text-to-speech systems may be adapted so that the synthesised speech sounds like a particular person. The average voice model (AVM) approach uses linear transforms to achieve this while multiple decision tree cluster adaptive training (CAT) represents different speakers as points in a low dimensional space. This paper describes a novel combination of CAT and AVM for modelling speakers. CAT yields higher quality synthetic speech than AVMs but AVMs model the target speaker better. The resulting combination may be interpreted as a more powerful version of the AVM. Results show that the combination achieves better target speaker similarity when compared with both AVM and CAT while the speech quality is in-between AVM and CAT.
Keywords
decision trees; maximum likelihood estimation; pattern clustering; speech synthesis; transforms; AVM approach; CAT; average voice model approach; hidden Markov model based text-to-speech systems; linear transforms; low dimensional space; multiple decision tree cluster adaptive training; speaker modelling; speech quality; synthesised speech; target speaker similarity; Adaptation models; Decision trees; Hidden Markov models; Speech; Training; Transforms; Vectors; Speech synthesis; average voice model; cluster adaptive training; voice cloning;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location
Florence
Type
conf
DOI
10.1109/ICASSP.2014.6853602
Filename
6853602
Link To Document