• DocumentCode
    2520438
  • Title

    Automatic speech decomposition and speech coding using MDCT-based hidden Markov chain and wavelet-based hidden Markov tree models

  • Author

    Tantibundhit, Charturong ; Boston, J. Robert ; Li, Ching Chung ; El-Jaroudi, Amro

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Pittsburgh Univ., PA, USA
  • fYear
    2005
  • fDate
    16-19 Oct. 2005
  • Firstpage
    207
  • Lastpage
    210
  • Abstract
    Speech is decomposed into three different components, based on the idea of Daudet and Torresani (Signal Processing, vol. 82, no. 11, pp. 1595, 2002), as signal = tonal + transient + residual. The tonal and transient components are identified using a small number of coefficients of the modified discrete cosine transform (MDCT) and the wavelet transform, respectively. Determinations of the significant MDCT and wavelet coefficients in the algorithm of Daudet and Torresani, referred as the D&T algorithm, are achieved by thresholds. All MDCT coefficients are assumed to be independent as well as wavelet coefficients. However, the MDCT coefficients probably have statistical dependencies, namely the clustering and persistence properties, and so do the wavelet coefficients. We propose a modification to the D&T algorithm, that can capture statistical dependencies by utilizing the hidden Markov model. The Viterbi and the maximum a posteriori (MAP) algorithms, used to find the optimal state distribution, are applied to determine the significant MDCT and wavelet coefficients automatically. The modified algorithm was used to encode 43 monosyllabic consonant-vowel-consonant (CVC) words and 3 sentences. Results showed that the modified algorithm improves the coding efficiency by 37% compared with the threshold method of D&T algorithm when equal numbers of significant coefficients are used.
  • Keywords
    discrete cosine transforms; hidden Markov models; maximum likelihood estimation; speech coding; trees (mathematics); wavelet transforms; MAP algorithms; Viterbi algorithms; automatic speech decomposition; clustering properties; coding efficiency; hidden Markov chain; maximum a posteriori algorithms; modified discrete cosine transform; monosyllabic consonant-vowel-consonant words; optimal state distribution; persistence properties; speech coding; wavelet coefficients; wavelet transform; wavelet-based hidden Markov tree models; Clustering algorithms; Discrete cosine transforms; Discrete wavelet transforms; Hidden Markov models; Signal processing; Signal processing algorithms; Speech coding; Speech processing; Viterbi algorithm; Wavelet coefficients;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Applications of Signal Processing to Audio and Acoustics, 2005. IEEE Workshop on
  • Print_ISBN
    0-7803-9154-3
  • Type

    conf

  • DOI
    10.1109/ASPAA.2005.1540206
  • Filename
    1540206