Title :
Speech Enhancement Using Gaussian Scale Mixture Models
Author :
Hao, Jiucang ; Lee, Te-Won ; Sejnowski, Terrence J.
Author_Institution :
Comput. Neurobiol. Lab., Salk Inst., La Jolla, CA, USA
Abstract :
This paper presents a novel probabilistic approach to speech enhancement. Instead of a deterministic logarithmic relationship, we assume a probabilistic relationship between the frequency coefficients and the log-spectra. The speech model in the log-spectral domain is a Gaussian mixture model (GMM). The frequency coefficients obey a zero-mean Gaussian whose covariance equals to the exponential of the log-spectra. This results in a Gaussian scale mixture model (GSMM) for the speech signal in the frequency domain, since the log-spectra can be regarded as scaling factors. The probabilistic relation between frequency coefficients and log-spectra allows these to be treated as two random variables, both to be estimated from the noisy signals. Expectation-maximization (EM) was used to train the GSMM and Bayesian inference was used to compute the posterior signal distribution. Because exact inference of this full probabilistic model is computationally intractable, we developed two approaches to enhance the efficiency: the Laplace method and a variational approximation. The proposed methods were applied to enhance speech corrupted by Gaussian noise and speech-shaped noise (SSN). For both approximations, signals reconstructed from the estimated frequency coefficients provided higher signal-to-noise ratio (SNR) and those reconstructed from the estimated log-spectra produced lower word recognition error rate because the log-spectra fit the inputs to the recognizer better. Our algorithms effectively reduced the SSN, which algorithms based on spectral analysis were not able to suppress.
Keywords :
Bayes methods; Gaussian noise; Laplace equations; covariance analysis; expectation-maximisation algorithm; frequency-domain analysis; random processes; signal reconstruction; spectral analysis; speech enhancement; speech recognition; variational techniques; Bayesian inference; Gaussian noise; Gaussian scale mixture model; Laplace method; covariance; expectation-maximization algorithm; frequency coefficient; frequency domain; log-spectral domain; noisy signal; posterior signal distribution; probabilistic relationship; random variable; signal reconstruction; signal-to-noise ratio; spectral analysis; speech enhancement; speech model; speech signal; speech-shaped noise; variational approximation; word recognition error rate; zero-mean Gaussian; Gaussian scale mixture model (GSMM); Laplace method; speech enhancement; variational approximation;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2009.2030012