Title :
Hybrid coding: combined harmonic and waveform coding of speech at 4 kb/s
Author :
Shlomot, Eyal ; Cuperman, Vladimir ; Gersho, Allen
Author_Institution :
Mindspeed Technol., Newport Beach, CA, USA
fDate :
9/1/2001 12:00:00 AM
Abstract :
A new hybrid speech coding technique is presented in this paper, which combines a frequency-domain parametric coder (for stationary voiced and stationary unvoiced speech) with a time-domain waveform coder (for transition speech). Our hybrid coder uses a parametric representation for the excitation of a linear-prediction filter. The excitation of stationary voiced speech is a sum of harmonic cosines with interpolated magnitudes and a synthetic phase model, the excitation for stationary unvoiced speech is a spectrally shaped noise, and the excitation for transition speech is a set of signed pulses. Signal alignment when switching between the harmonic excitation of stationary voiced speech and the pulse model used for transition speech is required, and achieved by special alignment procedures. A 4 kb/s hybrid coder, which achieves high-quality reconstructed speech, is described. The 4 kb/s hybrid coder employs a neural network classifier, and a novel pitch detection and harmonic bandwidth estimation algorithm. The locations of excitation pulses for coding transitions are determined by analysis-by-synthesis. A simple and efficient dimension conversion and quantization of the harmonic. Spectral magnitudes of voiced speech was devised, combining the general nonsquare transform (NST) or dimension conversion and a weighted vector quantization (VQ) approach. Subjective listening tests demonstrate that the 4 kb/s hybrid coding scheme competes favorably with CELP coders at low bit-rates
Keywords :
filtering theory; frequency-domain analysis; harmonics; neural nets; prediction theory; signal classification; signal reconstruction; signal representation; spectral analysis; speech coding; transform coding; vector quantisation; waveform analysis; 4 kbit/s; CELP coders; VQ; alignment procedures; analysis-by-synthesis; efficient dimension conversion; excitation pulses; frequency-domain parametric coder; harmonic bandwidth estimation algorithm; harmonic coding; harmonic cosines; harmonic excitation; hybrid speech coding; interpolated magnitudes; linear-prediction filter; neural network classifier; nonsquare transform; parametric representation; pitch detection algorithm; pulse model; signal alignment; signed pulses; spectral magnitudes; spectrally shaped noise; stationary unvoiced speech; stationary voiced speech; subjective listening tests; synthetic phase model; time-domain waveform coder; transition speech excitation; waveform coding; weighted vector quantization; Bandwidth; Neural networks; Noise shaping; Phase noise; Power harmonic filters; Pulse shaping methods; Quantization; Speech coding; Speech enhancement; Time domain analysis;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on