DocumentCode
730764
Title
Word embedding for recurrent neural network based TTS synthesis
Author
Peilu Wang ; Yao Qian ; Soong, Frank K. ; Lei He ; Hai Zhao
Author_Institution
Shanghai Jiao Tong Univ., Shanghai, China
fYear
2015
fDate
19-24 April 2015
Firstpage
4879
Lastpage
4883
Abstract
The current state of the art TTS synthesis can produce synthesized speech with highly decent quality if rich segmental and suprasegmental information are given. However, some suprasegmental features, e.g., Tone and Break (TOBI), are time consuming due to being manually labeled with a high inconsistency among different annotators. In this paper, we investigate the use of word embedding, which represents word with low dimensional continuous-valued vector and being assumed to carry a certain syntactic and semantic information, for bidirectional long short term memory (BLSTM), recurrent neural network (RNN) based TTS synthesis. Experimental results show that word embedding can significantly improve the performance of BLSTM-RNN based TTS synthesis without using features of TOBI and Part of Speech (POS).
Keywords
neural nets; speech synthesis; word processing; BLSTM-RNN based TTS synthesis; bidirectional long short term memory; low dimensional continuous-valued vector; recurrent neural network; segmental information; speech synthesis; suprasegmental information; word embedding; Artificial neural networks; Hidden Markov models; Recurrent neural networks; Speech; Training; Upper bound; BLSTM; RNN; Speech synthesis; TTS; word embedding;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location
South Brisbane, QLD
Type
conf
DOI
10.1109/ICASSP.2015.7178898
Filename
7178898
Link To Document