A Sinusoidal Voice Over Packet Coder Tailored for the Frame-Erasure Channel

Author

Lindblom, Jonas

Author_Institution

Skype Technol., Stockholm, Sweden

Volume

13

Issue

5

fYear

2005

Firstpage

787

Lastpage

798

Abstract

A speech coder tailored especially for the frame-erasure channel—the sinusoidal voice over packet coder (SVOPC)—is proposed. Based on a classified approach, avoiding interframe coding techniques, and synthesizing its output from slowly varying parameters, the coder is inherently robust to packet loss. SVOPC is based on quasi-harmonic modeling of the linear prediction (LP) residual. Both the sinusoidal amplitudes and phases are explicitly encoded using new methods based on Gaussian mixture models. A wide-band (16-kHz sampling frequency) implementation of the coder provides synthesized speech of good subjective quality at around 20 kbps. SVOPC is evaluated by means of subjective listening tests, and compared to a reference system based on G.722.2 (the AMR wide-band codec). Under frame erasure conditions (5%–30% frame erasures generated according to a Gilbert model), SVOPC clearly outperforms G.722.2.

Keywords

channel coding; linear predictive coding; speech coding; speech synthesis; vocoders; Gaussian mixture model; frame erasure channel; interframe coding technique; linear prediction residual; quasi harmonic modeling; sinusoidal voice over packet coder; speech coder; speech synthesis; Frequency synthesizers; IP networks; Predictive models; Protocols; Quality of service; Robustness; Sampling methods; Speech coding; Speech synthesis; Wideband; Frame-erasure; Gaussian mixture model; harmonic analysis; packet loss concealment; packet switching; speech coding; variable-dimension; vector quantization; wide-band;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/TSA.2005.851913

Filename

1495463