Title :
Joint matrix quantization of face parameters and LPC coefficients for low bit rate audiovisual speech coding
Author_Institution :
Inst. de la Commun. Parlee, INPG/Univ. Stendhal/CNRS, Grenoble, France
fDate :
5/1/2004 12:00:00 AM
Abstract :
A key problem for videophony, that is telephony including the processing of images of the speaker´s face in addition to acoustic speech, concerns signal compression for transmission. In such systems, audio and video compression are separately achieved by using both audio and video coders. In this paper, an audio-visual approach to this problem is considered, since we claim that the fundamental property of coherence (redundancy) between the two modalities of speech should be exploited by coding systems. We consider the framework of parametric analysis, modeling and synthesis of talking faces, which allows efficient representation of video information. Thus, we propose to jointly encode several face parameters, namely lip shape geometric descriptors, together with sets of audio coefficients, namely quite usual LPC parameters. The definition of an audiovisual distance between vectors of concatenated audio and video parameters allows to generate audiovisual single stage vector and matrix quantizers by using the generalized Lloyd algorithm. Calculation of video and audio mean distortion measures shows a significant gain in quantization accuracy and/or resolution compared to separate video and audio quantization. An alternative sub-optimal tree-like structure for audiovisual joint coding is also tested and yields interesting results while decreasing the computational complexity of the quantization process.
Keywords :
audio coding; audio-visual systems; computational complexity; concatenated codes; speech coding; vector quantisation; video coding; videotelephony; LPC coefficients; Lloyd algorithm; acoustic speech; audio coders; audio compression; audio-visual approach; coherence property; computational complexity; concatenated audio parameters; concatenated video parameters; face parameters; image processing; joint matrix quantization; lip-shape geometric descriptors; low bit rate audiovisual speech coding; matrix quantizers; parametric analysis; signal compression; signal transmission; speech modalities; speech processing; telephony; vector quantizers; video coders; video compression; videophony; Bit rate; Distortion measurement; Face; Linear predictive coding; Quantization; Signal processing; Speech coding; Speech processing; Telephony; Video compression;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
DOI :
10.1109/TSA.2003.822626