Title :
Model selection for mixture of gaussian based spectral modelling
Author :
Zolfaghari, Parham ; Kato, Hiroko ; Minami, Yasuhiro ; Nakamura, Atsushi ; Katagiri, Shigeru
Author_Institution :
NTT Commun. Sci. Lab., NTT Corp., Kyoto
fDate :
Sept. 29 2004-Oct. 1 2004
Abstract :
In this paper, we describe a parametric mixture model for modelling the resonant characteristics of the vocal tract. We propose a mixtures of Gaussians (MoG) spectral modelling scheme which enables model selection with a goal of easing the correspondence between the resonant characteristics of the vocal tract and the parametric Gaussians and representing a spectrum with an appropriate number of parameters. Noting that, a relatively small class of Gaussian densities can approximate a large class of distributions, we systematically reduce the number of Gaussians and re-approximate the densities in the MoG spectral model. The Kullback-Leibler (KL) distance between the densities in the mixture was found to allow optimal ML-MoG solutions to the spectra. A fitness measure based on KL information provides a figure for estimating the model order in representing formant-like features. The mixture model was fitted to a normalised smooth spectrum obtained by filtering the short-time Fourier transform in time and frequency by a pitch adaptive Gaussian filter. This results in the removal of all source information from the spectra. By subjectively evaluating the quality of the analysed and synthesised speech using this parametrisation scheme, we show considerable improvement over ML using this Gaussian reduction scheme specifically when using lower number of Gaussians in the mixture
Keywords :
Fourier transforms; Gaussian processes; acoustic resonance; spectral analysis; speech synthesis; Gaussian based spectral modelling; Gaussian reduction scheme; Kullback-Leibler distance; fitness measure; model selection; normalised smooth spectrum; parametric mixture model; pitch adaptive Gaussian filter; short-time Fourier transform; vocal tract; Adaptive filters; Cepstral analysis; Electronic mail; Fourier transforms; Frequency; Parametric statistics; Propagation losses; Resonance; Speech analysis; Speech synthesis;
Conference_Titel :
Machine Learning for Signal Processing, 2004. Proceedings of the 2004 14th IEEE Signal Processing Society Workshop
Conference_Location :
Sao Luis
Print_ISBN :
0-7803-8608-4
DOI :
10.1109/MLSP.2004.1422990