Title :
Automatic speech decomposition and speech coding using MDCT-based hidden Markov chain and wavelet-based hidden Markov tree models
Author :
Tantibundhit, Charturong ; Boston, J. Robert ; Li, Ching Chung ; El-Jaroudi, Amro
Author_Institution :
Dept. of Electr. & Comput. Eng., Pittsburgh Univ., PA, USA
Abstract :
Speech is decomposed into three different components, based on the idea of Daudet and Torresani (Signal Processing, vol. 82, no. 11, pp. 1595, 2002), as signal = tonal + transient + residual. The tonal and transient components are identified using a small number of coefficients of the modified discrete cosine transform (MDCT) and the wavelet transform, respectively. Determinations of the significant MDCT and wavelet coefficients in the algorithm of Daudet and Torresani, referred as the D&T algorithm, are achieved by thresholds. All MDCT coefficients are assumed to be independent as well as wavelet coefficients. However, the MDCT coefficients probably have statistical dependencies, namely the clustering and persistence properties, and so do the wavelet coefficients. We propose a modification to the D&T algorithm, that can capture statistical dependencies by utilizing the hidden Markov model. The Viterbi and the maximum a posteriori (MAP) algorithms, used to find the optimal state distribution, are applied to determine the significant MDCT and wavelet coefficients automatically. The modified algorithm was used to encode 43 monosyllabic consonant-vowel-consonant (CVC) words and 3 sentences. Results showed that the modified algorithm improves the coding efficiency by 37% compared with the threshold method of D&T algorithm when equal numbers of significant coefficients are used.
Keywords :
discrete cosine transforms; hidden Markov models; maximum likelihood estimation; speech coding; trees (mathematics); wavelet transforms; MAP algorithms; Viterbi algorithms; automatic speech decomposition; clustering properties; coding efficiency; hidden Markov chain; maximum a posteriori algorithms; modified discrete cosine transform; monosyllabic consonant-vowel-consonant words; optimal state distribution; persistence properties; speech coding; wavelet coefficients; wavelet transform; wavelet-based hidden Markov tree models; Clustering algorithms; Discrete cosine transforms; Discrete wavelet transforms; Hidden Markov models; Signal processing; Signal processing algorithms; Speech coding; Speech processing; Viterbi algorithm; Wavelet coefficients;
Conference_Titel :
Applications of Signal Processing to Audio and Acoustics, 2005. IEEE Workshop on
Print_ISBN :
0-7803-9154-3
DOI :
10.1109/ASPAA.2005.1540206