مرکز منطقه ای اطلاع رساني علوم و فناوري - Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends

DocumentCode :

13843

Title :

Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends

Author :

Zhen-Hua Ling ; Shi-Yin Kang ; Heiga Zen ; Senior, Andrew ; Schuster, Mike ; Xiao-Jun Qian ; Meng, Helen M. ; Li Deng

Author_Institution :

Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China

Volume :

Issue :

fYear :

2015

fDate :

May-15

Firstpage :

Lastpage :

Abstract :

Hidden Markov models (HMMs) and Gaussian mixture models (GMMs) are the two most common types of acoustic models used in statistical parametric approaches for generating low-level speech waveforms from high-level symbolic inputs via intermediate acoustic feature sequences. However, these models have their limitations in representing complex, nonlinear relationships between the speech generation inputs and the acoustic features. Inspired by the intrinsically hierarchical process of human speech production and by the successful application of deep neural networks (DNNs) to automatic speech recognition (ASR), deep learning techniques have also been applied successfully to speech generation, as reported in recent literature. This article systematically reviews these emerging speech generation approaches, with the dual goal of helping readers gain a better understanding of the existing techniques as well as stimulating new work in the burgeoning area of deep learning for parametric speech generation.

Keywords :

Gaussian processes; acoustic signal processing; hidden Markov models; mixture models; neural nets; speech recognition; ASR; DNN; GMM; Gaussian mixture models; HMM; acoustic features; acoustic modeling; acoustic models; automatic speech recognition; burgeoning area; deep learning; deep neural networks; hidden Markov models; high-level symbolic inputs; human speech production; intermediate acoustic feature sequences; low-level speech waveforms; parametric speech generation; statistical parametric approach; Acoustic signal detection; Gaussian mixture models; Hidden Markov models; Speech processing; Speech recognition; Speech synthesis; Vocoders;

fLanguage :

English

Journal_Title :

Signal Processing Magazine, IEEE

Publisher :

ieee

ISSN :

1053-5888

Type :

jour

DOI :

10.1109/MSP.2014.2359987

Filename :

7078992

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=13843