• DocumentCode
    177497
  • Title

    Cluster adaptive training of average voice models

  • Author

    Wan, Vincent ; Latorre, Javier ; Yanagisawa, Kei ; Gales, Mark ; Stylianou, Yannis

  • Author_Institution
    Toshiba Res. Eur. Ltd., Cambridge, UK
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    280
  • Lastpage
    284
  • Abstract
    Hidden Markov model based text-to-speech systems may be adapted so that the synthesised speech sounds like a particular person. The average voice model (AVM) approach uses linear transforms to achieve this while multiple decision tree cluster adaptive training (CAT) represents different speakers as points in a low dimensional space. This paper describes a novel combination of CAT and AVM for modelling speakers. CAT yields higher quality synthetic speech than AVMs but AVMs model the target speaker better. The resulting combination may be interpreted as a more powerful version of the AVM. Results show that the combination achieves better target speaker similarity when compared with both AVM and CAT while the speech quality is in-between AVM and CAT.
  • Keywords
    decision trees; maximum likelihood estimation; pattern clustering; speech synthesis; transforms; AVM approach; CAT; average voice model approach; hidden Markov model based text-to-speech systems; linear transforms; low dimensional space; multiple decision tree cluster adaptive training; speaker modelling; speech quality; synthesised speech; target speaker similarity; Adaptation models; Decision trees; Hidden Markov models; Speech; Training; Transforms; Vectors; Speech synthesis; average voice model; cluster adaptive training; voice cloning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6853602
  • Filename
    6853602