Title :
A Scalable Phonetic Vocoder Framework Using Joint Predictive Vector Quantization of Melp Parameters
Author_Institution :
MIT Lincoln Lab.
Abstract :
We present the framework for a scalable phonetic vocoder (SPV) capable of operating at bit rates from 300 - 1100 bps. The underlying system uses an HMM-based phonetic speech recognizer to estimate the parameters for MELP speech synthesis. We extend this baseline technique in three ways. First, we introduce the concept of predictive time evolution to generate a smoother path for the synthesizer parameters, and show that it improves speech quality. Then, since the output speech from the phonetic vocoder is still limited by such low bit rates, we propose a scalable system where the accuracy of the MELP parameters is increased by vector quantizing the error signal between the true and phonetic-estimated MELP parameters. Finally, we apply an extremely flexible technique for exploiting correlations in these parameters over time, which we call joint predictive vector quantization (JPVQ). We show that significant quality improvement can be attained by adding as few as 400 bps to the baseline phonetic vocoder using JPVQ. The resulting SPV system provides a flexible platform for adjusting the phonetic vocoder bit rate and speech quality
Keywords :
speech coding; speech recognition; speech synthesis; vector quantisation; vocoders; HMM; MELP parameters; joint predictive vector quantization; phonetic speech recognizer; predictive vector quantization; scalable phonetic vocoder framework; speech quality; speech synthesis; Adaptive filters; Bit rate; Hidden Markov models; Linear predictive coding; Speech recognition; Speech synthesis; State estimation; Synthesizers; Vector quantization; Vocoders;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Conference_Location :
Toulouse
Print_ISBN :
1-4244-0469-X
DOI :
10.1109/ICASSP.2006.1660118