Title :
Encoding speech using prototype waveforms
Author :
Kleijn, W. Bastiaan
Author_Institution :
Speech Res. Dept., AT&T Bell Labs., Murray Hill, NJ, USA
fDate :
10/1/1993 12:00:00 AM
Abstract :
Voiced speech is interpreted as a concentration of slowly evolving pitch-cycle waveforms. This signal can be reconstructed by interpolation from a downsampled sequence of pitch-cycle waveforms with a rate of one prototype waveform per 20-30 ms interval. The prototype waveform is described by a set of linear-prediction (LP) filter coefficients describing the formant structure and a prototype excitation waveform, quantized with analysis-by-synthesis procedures. The speech signal is reconstructed by filtering an excitation signal consisting of the concatenation of (infinitesimal) sections of the instantaneous excitation waveforms. To obtain the correct level of periodicity, the short-term and the long-term correlations between the instantaneous excitation waveforms can be controlled explicitly. Thus, distortions such as noise, reverberation, and buzziness can be prevented. The coding method is easily combined with existing LP-based speech coders, such as CELP, for unvoiced signals. Excellent voiced speech quality is obtained at rates between 3.0 and 4.0 kb/s
Keywords :
linear predictive coding; speech coding; 3 to 4 kbit/s; CELP; analysis-by-synthesis procedures; concatenation; excitation signal; instantaneous excitation waveforms; interpolation; linear-prediction filter coefficients; pitch-cycle waveforms; prototype waveforms; speech coding; speech signal reconstruction; unvoiced signals; voiced speech; Bit rate; Encoding; Filters; Interpolation; Prototypes; Reverberation; Signal sampling; Speech analysis; Speech coding; Speech processing;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on