مرکز منطقه ای اطلاع رساني علوم و فناوري - Word embedding for recurrent neural network based TTS synthesis

DocumentCode :

730764

Title :

Word embedding for recurrent neural network based TTS synthesis

Author :

Peilu Wang ; Yao Qian ; Soong, Frank K. ; Lei He ; Hai Zhao

Author_Institution :

Shanghai Jiao Tong Univ., Shanghai, China

fYear :

2015

fDate :

19-24 April 2015

Firstpage :

4879

Lastpage :

4883

Abstract :

The current state of the art TTS synthesis can produce synthesized speech with highly decent quality if rich segmental and suprasegmental information are given. However, some suprasegmental features, e.g., Tone and Break (TOBI), are time consuming due to being manually labeled with a high inconsistency among different annotators. In this paper, we investigate the use of word embedding, which represents word with low dimensional continuous-valued vector and being assumed to carry a certain syntactic and semantic information, for bidirectional long short term memory (BLSTM), recurrent neural network (RNN) based TTS synthesis. Experimental results show that word embedding can significantly improve the performance of BLSTM-RNN based TTS synthesis without using features of TOBI and Part of Speech (POS).

Keywords :

neural nets; speech synthesis; word processing; BLSTM-RNN based TTS synthesis; bidirectional long short term memory; low dimensional continuous-valued vector; recurrent neural network; segmental information; speech synthesis; suprasegmental information; word embedding; Artificial neural networks; Hidden Markov models; Recurrent neural networks; Speech; Training; Upper bound; BLSTM; RNN; Speech synthesis; TTS; word embedding;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location :

South Brisbane, QLD

Type :

conf

DOI :

10.1109/ICASSP.2015.7178898

Filename :

7178898

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=730764