• DocumentCode
    1756872
  • Title

    Analysis and Synthesis of Speech Using an Adaptive Full-Band Harmonic Model

  • Author

    Degottex, Gilles ; Stylianou, Yannis

  • Author_Institution
    Comput. Sci. Dept. & FORTH, Univ. of Crete, Heraklion, Greece
  • Volume
    21
  • Issue
    10
  • fYear
    2013
  • fDate
    Oct. 2013
  • Firstpage
    2085
  • Lastpage
    2095
  • Abstract
    Voice models often use frequency limits to split the speech spectrum into two or more voiced/unvoiced frequency bands. However, from the voice production, the amplitude spectrum of the voiced source decreases smoothly without any abrupt frequency limit. Accordingly, multiband models struggle to estimate these limits and, as a consequence, artifacts can degrade the perceived quality. Using a linear frequency basis adapted to the non-stationarities of the speech signal, the Fan Chirp Transformation (FChT) have demonstrated harmonicity at frequencies higher than usually observed from the DFT which motivates a full-band modeling. The previously proposed Adaptive Quasi-Harmonic model (aQHM) offers even more flexibility than the FChT by using a non-linear frequency basis. In the current paper, exploiting the properties of aQHM, we describe a full-band Adaptive Harmonic Model (aHM) along with detailed descriptions of its corresponding algorithms for the estimation of harmonics up to the Nyquist frequency. Formal listening tests show that the speech reconstructed using aHM is nearly indistinguishable from the original speech. Experiments with synthetic signals also show that the proposed aHM globally outperforms previous sinusoidal and harmonic models in terms of precision in estimating the sinusoidal parameters. As a perspective, such a precision is interesting for building higher level models upon the sinusoidal parameters, like spectral envelopes for speech synthesis.
  • Keywords
    discrete Fourier transforms; signal reconstruction; speech synthesis; DFT; FChT; Nyquist frequency; adaptive full-band harmonic model; adaptive quasi-harmonic model; fan chirp transformation; formal listening tests; frequency limits; full-band adaptive harmonic model; linear frequency; multiband models; nonlinear frequency; sinusoidal parameters; speech analysis; speech signal; speech synthesis; synthetic signals; voice models; voiced-unvoiced frequency bands; Voice model; harmonic model; non-stationary; sinusoidal model;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2013.2266772
  • Filename
    6525352