• DocumentCode
    730764
  • Title

    Word embedding for recurrent neural network based TTS synthesis

  • Author

    Peilu Wang ; Yao Qian ; Soong, Frank K. ; Lei He ; Hai Zhao

  • Author_Institution
    Shanghai Jiao Tong Univ., Shanghai, China
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4879
  • Lastpage
    4883
  • Abstract
    The current state of the art TTS synthesis can produce synthesized speech with highly decent quality if rich segmental and suprasegmental information are given. However, some suprasegmental features, e.g., Tone and Break (TOBI), are time consuming due to being manually labeled with a high inconsistency among different annotators. In this paper, we investigate the use of word embedding, which represents word with low dimensional continuous-valued vector and being assumed to carry a certain syntactic and semantic information, for bidirectional long short term memory (BLSTM), recurrent neural network (RNN) based TTS synthesis. Experimental results show that word embedding can significantly improve the performance of BLSTM-RNN based TTS synthesis without using features of TOBI and Part of Speech (POS).
  • Keywords
    neural nets; speech synthesis; word processing; BLSTM-RNN based TTS synthesis; bidirectional long short term memory; low dimensional continuous-valued vector; recurrent neural network; segmental information; speech synthesis; suprasegmental information; word embedding; Artificial neural networks; Hidden Markov models; Recurrent neural networks; Speech; Training; Upper bound; BLSTM; RNN; Speech synthesis; TTS; word embedding;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178898
  • Filename
    7178898