DocumentCode :
730764
Title :
Word embedding for recurrent neural network based TTS synthesis
Author :
Peilu Wang ; Yao Qian ; Soong, Frank K. ; Lei He ; Hai Zhao
Author_Institution :
Shanghai Jiao Tong Univ., Shanghai, China
fYear :
2015
fDate :
19-24 April 2015
Firstpage :
4879
Lastpage :
4883
Abstract :
The current state of the art TTS synthesis can produce synthesized speech with highly decent quality if rich segmental and suprasegmental information are given. However, some suprasegmental features, e.g., Tone and Break (TOBI), are time consuming due to being manually labeled with a high inconsistency among different annotators. In this paper, we investigate the use of word embedding, which represents word with low dimensional continuous-valued vector and being assumed to carry a certain syntactic and semantic information, for bidirectional long short term memory (BLSTM), recurrent neural network (RNN) based TTS synthesis. Experimental results show that word embedding can significantly improve the performance of BLSTM-RNN based TTS synthesis without using features of TOBI and Part of Speech (POS).
Keywords :
neural nets; speech synthesis; word processing; BLSTM-RNN based TTS synthesis; bidirectional long short term memory; low dimensional continuous-valued vector; recurrent neural network; segmental information; speech synthesis; suprasegmental information; word embedding; Artificial neural networks; Hidden Markov models; Recurrent neural networks; Speech; Training; Upper bound; BLSTM; RNN; Speech synthesis; TTS; word embedding;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
Type :
conf
DOI :
10.1109/ICASSP.2015.7178898
Filename :
7178898
Link To Document :
بازگشت