مرکز منطقه ای اطلاع رساني علوم و فناوري - Hybrid coding: combined harmonic and waveform coding of speech at 4 kb/s

DocumentCode :

1522392

Title :

Hybrid coding: combined harmonic and waveform coding of speech at 4 kb/s

Author :

Shlomot, Eyal ; Cuperman, Vladimir ; Gersho, Allen

Author_Institution :

Mindspeed Technol., Newport Beach, CA, USA

Volume :

Issue :

fYear :

2001

fDate :

9/1/2001 12:00:00 AM

Firstpage :

632

Lastpage :

646

Abstract :

A new hybrid speech coding technique is presented in this paper, which combines a frequency-domain parametric coder (for stationary voiced and stationary unvoiced speech) with a time-domain waveform coder (for transition speech). Our hybrid coder uses a parametric representation for the excitation of a linear-prediction filter. The excitation of stationary voiced speech is a sum of harmonic cosines with interpolated magnitudes and a synthetic phase model, the excitation for stationary unvoiced speech is a spectrally shaped noise, and the excitation for transition speech is a set of signed pulses. Signal alignment when switching between the harmonic excitation of stationary voiced speech and the pulse model used for transition speech is required, and achieved by special alignment procedures. A 4 kb/s hybrid coder, which achieves high-quality reconstructed speech, is described. The 4 kb/s hybrid coder employs a neural network classifier, and a novel pitch detection and harmonic bandwidth estimation algorithm. The locations of excitation pulses for coding transitions are determined by analysis-by-synthesis. A simple and efficient dimension conversion and quantization of the harmonic. Spectral magnitudes of voiced speech was devised, combining the general nonsquare transform (NST) or dimension conversion and a weighted vector quantization (VQ) approach. Subjective listening tests demonstrate that the 4 kb/s hybrid coding scheme competes favorably with CELP coders at low bit-rates

Keywords :

filtering theory; frequency-domain analysis; harmonics; neural nets; prediction theory; signal classification; signal reconstruction; signal representation; spectral analysis; speech coding; transform coding; vector quantisation; waveform analysis; 4 kbit/s; CELP coders; VQ; alignment procedures; analysis-by-synthesis; efficient dimension conversion; excitation pulses; frequency-domain parametric coder; harmonic bandwidth estimation algorithm; harmonic coding; harmonic cosines; harmonic excitation; hybrid speech coding; interpolated magnitudes; linear-prediction filter; neural network classifier; nonsquare transform; parametric representation; pitch detection algorithm; pulse model; signal alignment; signed pulses; spectral magnitudes; spectrally shaped noise; stationary unvoiced speech; stationary voiced speech; subjective listening tests; synthetic phase model; time-domain waveform coder; transition speech excitation; waveform coding; weighted vector quantization; Bandwidth; Neural networks; Noise shaping; Phase noise; Power harmonic filters; Pulse shaping methods; Quantization; Speech coding; Speech enhancement; Time domain analysis;

fLanguage :

English

Journal_Title :

Speech and Audio Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1063-6676

Type :

jour

DOI :

10.1109/89.943341

Filename :

943341

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1522392