DocumentCode
2520438
Title
Automatic speech decomposition and speech coding using MDCT-based hidden Markov chain and wavelet-based hidden Markov tree models
Author
Tantibundhit, Charturong ; Boston, J. Robert ; Li, Ching Chung ; El-Jaroudi, Amro
Author_Institution
Dept. of Electr. & Comput. Eng., Pittsburgh Univ., PA, USA
fYear
2005
fDate
16-19 Oct. 2005
Firstpage
207
Lastpage
210
Abstract
Speech is decomposed into three different components, based on the idea of Daudet and Torresani (Signal Processing, vol. 82, no. 11, pp. 1595, 2002), as signal = tonal + transient + residual. The tonal and transient components are identified using a small number of coefficients of the modified discrete cosine transform (MDCT) and the wavelet transform, respectively. Determinations of the significant MDCT and wavelet coefficients in the algorithm of Daudet and Torresani, referred as the D&T algorithm, are achieved by thresholds. All MDCT coefficients are assumed to be independent as well as wavelet coefficients. However, the MDCT coefficients probably have statistical dependencies, namely the clustering and persistence properties, and so do the wavelet coefficients. We propose a modification to the D&T algorithm, that can capture statistical dependencies by utilizing the hidden Markov model. The Viterbi and the maximum a posteriori (MAP) algorithms, used to find the optimal state distribution, are applied to determine the significant MDCT and wavelet coefficients automatically. The modified algorithm was used to encode 43 monosyllabic consonant-vowel-consonant (CVC) words and 3 sentences. Results showed that the modified algorithm improves the coding efficiency by 37% compared with the threshold method of D&T algorithm when equal numbers of significant coefficients are used.
Keywords
discrete cosine transforms; hidden Markov models; maximum likelihood estimation; speech coding; trees (mathematics); wavelet transforms; MAP algorithms; Viterbi algorithms; automatic speech decomposition; clustering properties; coding efficiency; hidden Markov chain; maximum a posteriori algorithms; modified discrete cosine transform; monosyllabic consonant-vowel-consonant words; optimal state distribution; persistence properties; speech coding; wavelet coefficients; wavelet transform; wavelet-based hidden Markov tree models; Clustering algorithms; Discrete cosine transforms; Discrete wavelet transforms; Hidden Markov models; Signal processing; Signal processing algorithms; Speech coding; Speech processing; Viterbi algorithm; Wavelet coefficients;
fLanguage
English
Publisher
ieee
Conference_Titel
Applications of Signal Processing to Audio and Acoustics, 2005. IEEE Workshop on
Print_ISBN
0-7803-9154-3
Type
conf
DOI
10.1109/ASPAA.2005.1540206
Filename
1540206
Link To Document