مرکز منطقه ای اطلاع رساني علوم و فناوري - Model selection for mixture of gaussian based spectral modelling

DocumentCode :

3078036

Title :

Model selection for mixture of gaussian based spectral modelling

Author :

Zolfaghari, Parham ; Kato, Hiroko ; Minami, Yasuhiro ; Nakamura, Atsushi ; Katagiri, Shigeru

Author_Institution :

NTT Commun. Sci. Lab., NTT Corp., Kyoto

fYear :

2004

fDate :

Sept. 29 2004-Oct. 1 2004

Firstpage :

325

Lastpage :

334

Abstract :

In this paper, we describe a parametric mixture model for modelling the resonant characteristics of the vocal tract. We propose a mixtures of Gaussians (MoG) spectral modelling scheme which enables model selection with a goal of easing the correspondence between the resonant characteristics of the vocal tract and the parametric Gaussians and representing a spectrum with an appropriate number of parameters. Noting that, a relatively small class of Gaussian densities can approximate a large class of distributions, we systematically reduce the number of Gaussians and re-approximate the densities in the MoG spectral model. The Kullback-Leibler (KL) distance between the densities in the mixture was found to allow optimal ML-MoG solutions to the spectra. A fitness measure based on KL information provides a figure for estimating the model order in representing formant-like features. The mixture model was fitted to a normalised smooth spectrum obtained by filtering the short-time Fourier transform in time and frequency by a pitch adaptive Gaussian filter. This results in the removal of all source information from the spectra. By subjectively evaluating the quality of the analysed and synthesised speech using this parametrisation scheme, we show considerable improvement over ML using this Gaussian reduction scheme specifically when using lower number of Gaussians in the mixture

Keywords :

Fourier transforms; Gaussian processes; acoustic resonance; spectral analysis; speech synthesis; Gaussian based spectral modelling; Gaussian reduction scheme; Kullback-Leibler distance; fitness measure; model selection; normalised smooth spectrum; parametric mixture model; pitch adaptive Gaussian filter; short-time Fourier transform; vocal tract; Adaptive filters; Cepstral analysis; Electronic mail; Fourier transforms; Frequency; Parametric statistics; Propagation losses; Resonance; Speech analysis; Speech synthesis;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Machine Learning for Signal Processing, 2004. Proceedings of the 2004 14th IEEE Signal Processing Society Workshop

Conference_Location :

Sao Luis

ISSN :

1551-2541

Print_ISBN :

0-7803-8608-4

Type :

conf

DOI :

10.1109/MLSP.2004.1422990

Filename :

1422990

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3078036