A multi-level representation of f0 using the continuous wavelet transform and the Discrete Cosine Transform

Author

Ribeiro, Manuel Sam ; Clark, Robert A. J.

Author_Institution

Centre for Speech Technol. Res., Univ. of Edinburgh, Edinburgh, UK

fYear

2015

fDate

19-24 April 2015

Firstpage

4909

Lastpage

4913

Abstract

We propose a representation of f0 using the Continuous Wavelet Transform (CWT) and the Discrete Cosine Transform (DCT). The CWT decomposes the signal into various scales of selected frequencies, while the DCT compactly represents complex contours as a weighted sum of cosine functions. The proposed approach has the advantage of combining signal decomposition and higher-level representations, thus modeling low-frequencies at higher levels and high-frequencies at lower-levels. Objective results indicate that this representation improves f0 prediction over traditional short-term approaches. Subjective results show that improvements are seen over the typical MSD-HMM and are comparable to the recently proposed CWT-HMM, while using less parameters. These results are discussed and future lines of research are proposed.

Keywords

discrete cosine transforms; speech synthesis; CWT; DCT; continuous wavelet transform; cosine functions; discrete cosine transform; higher level representations; multilevel representation; selected frequencies; signal decomposition; statistical parametric speech synthesis techniques; Continuous wavelet transforms; Discrete cosine transforms; Hidden Markov models; Speech; Speech synthesis; HMM-based synthesis; continuous wavelet transform; discrete cosine transform; f0 modeling; prosody;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location

South Brisbane, QLD

Type

conf

DOI

10.1109/ICASSP.2015.7178904

Filename

7178904