Title :
Waveform quantization of speech using Gaussian mixture models
Author :
Samuelsson, Jonas
Author_Institution :
Dept. Signals, Sensors & Syst., R. Inst. of Technol., Stockholm, Sweden
Abstract :
Waveform quantization of speech using Gaussian mixture models (GMM) is proposed. GMM are trained directly on the speech waveform, and high dimensional vector quantizers (VQ) that efficiently exploit the redundancy are constructed based on the GMM parameters. Two types of GMM are studied. The complexity of the scheme is independent of the rate, and the rate can be changed without retraining the VQ. A shape-gain structure improves performance and robustness. Pre- and post-processing using spectral amplitude warping further improves perceptual quality. A 32-dimensional VQ operating at 2 bits/sample reproduces speech sampled at 8 kHz with a PESQ score of 4.2.
Keywords :
Gaussian distribution; redundancy; spectral analysis; speech codecs; speech coding; vector quantisation; 8 kHz; GMM training; Gaussian mixture models; VQ; audio codecs; high dimensional vector quantizers; perceptual quality; redundancy; shape-gain structure; spectral amplitude warping; speech waveform quantization; Bandwidth; Codecs; Covariance matrix; Data compression; Design optimization; Quantization; Robustness; Sensor systems; Speech; Surface acoustic waves;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
Print_ISBN :
0-7803-8484-9
DOI :
10.1109/ICASSP.2004.1325948