Title :
Companded quantization of speech MDCT coefficients
Author :
Nordén, Fredrik ; Hedelin, Per
Author_Institution :
Dept. of Commun. Technol., Aalborg Univ., Denmark
fDate :
3/1/2005 12:00:00 AM
Abstract :
Here, we propose speech-coding procedures achieving high subjective quality, avoiding speech-specific processing and interframe exploitation. Thus, the scheme is tractable for packet-based voice communication, and has the capability of coding generic audio. The architecture is based on an modified discrete cosine transform (MDCT) representation of the signal, and combines efficient vector quantization (VQ) techniques with psychoacoustic principles. Weighted quantization of MDCT coefficients is performed, using a codebook based on a statistical model of the multidimensional MDCT pdf. The weighting and the codebook are adapted for each frame to account for masking thresholds given by a psychoacoustic analysis. Actual quantization is performed using lattices, thereby, achieving close to rate independent complexity. The result is a coding scheme operational at a range of rates. Here, a particular instance at 16 kbits/s, using a sampling frequency of 8 kHz, is shown to perform better than an LD-CELP operating at the same rate, even though no interframe memory is exploited.
Keywords :
audio coding; computational complexity; discrete cosine transforms; speech coding; statistical analysis; vector quantisation; voice communication; companded quantization; generic audio coding; interframe memory exploitation; modified discrete cosine transform; multidimensional MDCT; packet-based voice communication; psychoacoustic analysis; signal representation; speech MDCT coefficient; speech-coding procedure; speech-specific processing; vector quantization technique; weighted quantization; Discrete cosine transforms; Frequency; Lattices; Masking threshold; Multidimensional systems; Psychoacoustic models; Psychology; Sampling methods; Speech; Vector quantization; Audio coding; modified discrete cosine transform (MDCT); psycho acoustics; speech-coding; statistical modeling; vector quantization (VQ);
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
DOI :
10.1109/TSA.2004.838535