مرکز منطقه ای اطلاع رساني علوم و فناوري - Automatic speech decomposition and speech coding using MDCT-based hidden Markov chain and wavelet-based hidden Markov tree models

DocumentCode :

2520438

Title :

Automatic speech decomposition and speech coding using MDCT-based hidden Markov chain and wavelet-based hidden Markov tree models

Author :

Tantibundhit, Charturong ; Boston, J. Robert ; Li, Ching Chung ; El-Jaroudi, Amro

Author_Institution :

Dept. of Electr. & Comput. Eng., Pittsburgh Univ., PA, USA

fYear :

2005

fDate :

16-19 Oct. 2005

Firstpage :

207

Lastpage :

210

Abstract :

Speech is decomposed into three different components, based on the idea of Daudet and Torresani (Signal Processing, vol. 82, no. 11, pp. 1595, 2002), as signal = tonal + transient + residual. The tonal and transient components are identified using a small number of coefficients of the modified discrete cosine transform (MDCT) and the wavelet transform, respectively. Determinations of the significant MDCT and wavelet coefficients in the algorithm of Daudet and Torresani, referred as the D&T algorithm, are achieved by thresholds. All MDCT coefficients are assumed to be independent as well as wavelet coefficients. However, the MDCT coefficients probably have statistical dependencies, namely the clustering and persistence properties, and so do the wavelet coefficients. We propose a modification to the D&T algorithm, that can capture statistical dependencies by utilizing the hidden Markov model. The Viterbi and the maximum a posteriori (MAP) algorithms, used to find the optimal state distribution, are applied to determine the significant MDCT and wavelet coefficients automatically. The modified algorithm was used to encode 43 monosyllabic consonant-vowel-consonant (CVC) words and 3 sentences. Results showed that the modified algorithm improves the coding efficiency by 37% compared with the threshold method of D&T algorithm when equal numbers of significant coefficients are used.

Keywords :

discrete cosine transforms; hidden Markov models; maximum likelihood estimation; speech coding; trees (mathematics); wavelet transforms; MAP algorithms; Viterbi algorithms; automatic speech decomposition; clustering properties; coding efficiency; hidden Markov chain; maximum a posteriori algorithms; modified discrete cosine transform; monosyllabic consonant-vowel-consonant words; optimal state distribution; persistence properties; speech coding; wavelet coefficients; wavelet transform; wavelet-based hidden Markov tree models; Clustering algorithms; Discrete cosine transforms; Discrete wavelet transforms; Hidden Markov models; Signal processing; Signal processing algorithms; Speech coding; Speech processing; Viterbi algorithm; Wavelet coefficients;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Applications of Signal Processing to Audio and Acoustics, 2005. IEEE Workshop on

Print_ISBN :

0-7803-9154-3

Type :

conf

DOI :

10.1109/ASPAA.2005.1540206

Filename :

1540206

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2520438