مرکز منطقه ای اطلاع رساني علوم و فناوري - Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis

DocumentCode :

730658

Title :

Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis

Author :

Tokuday, Keiichi ; Zen, Heiga

fYear :

2015

fDate :

19-24 April 2015

Firstpage :

4215

Lastpage :

4219

Abstract :

This paper proposes a novel approach for directly-modeling speech at the waveform level using a neural network. This approach uses the neural network-based statistical parametric speech synthesis framework with a specially designed output layer. As acoustic feature extraction is integrated to acoustic model training, it can overcome the limitations of conventional approaches, such as two-step (feature extraction and acoustic modeling) optimization, use of spectra rather than waveforms as targets, use of overlapping and shifting frames as unit, and fixed decision tree structure. Experimental results show that the proposed approach can directly maximize the likelihood defined at the waveform domain.

Keywords :

acoustic signal processing; feature extraction; learning (artificial intelligence); maximum likelihood estimation; neural nets; speech synthesis; acoustic feature extraction; acoustic model training; likelihood maximization; neural network; speech waveform direct model; statistical parametric speech synthesis; waveform domain; Algorithm design and analysis; Cepstral analysis; Distortion; Hidden Markov models; Speech; Training; Statistical parametric speech synthesis; adaptive cepstral analysis; neural network;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location :

South Brisbane, QLD

Type :

conf

DOI :

10.1109/ICASSP.2015.7178765

Filename :

7178765

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=730658