• DocumentCode
    3431306
  • Title

    A multi-level representation of f0 using the continuous wavelet transform and the Discrete Cosine Transform

  • Author

    Ribeiro, Manuel Sam ; Clark, Robert A. J.

  • Author_Institution
    Centre for Speech Technol. Res., Univ. of Edinburgh, Edinburgh, UK
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4909
  • Lastpage
    4913
  • Abstract
    We propose a representation of f0 using the Continuous Wavelet Transform (CWT) and the Discrete Cosine Transform (DCT). The CWT decomposes the signal into various scales of selected frequencies, while the DCT compactly represents complex contours as a weighted sum of cosine functions. The proposed approach has the advantage of combining signal decomposition and higher-level representations, thus modeling low-frequencies at higher levels and high-frequencies at lower-levels. Objective results indicate that this representation improves f0 prediction over traditional short-term approaches. Subjective results show that improvements are seen over the typical MSD-HMM and are comparable to the recently proposed CWT-HMM, while using less parameters. These results are discussed and future lines of research are proposed.
  • Keywords
    discrete cosine transforms; speech synthesis; CWT; DCT; continuous wavelet transform; cosine functions; discrete cosine transform; higher level representations; multilevel representation; selected frequencies; signal decomposition; statistical parametric speech synthesis techniques; Continuous wavelet transforms; Discrete cosine transforms; Hidden Markov models; Speech; Speech synthesis; HMM-based synthesis; continuous wavelet transform; discrete cosine transform; f0 modeling; prosody;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178904
  • Filename
    7178904