• DocumentCode
    1687252
  • Title

    F0 contour prediction with a deep belief network-Gaussian process hybrid model

  • Author

    Fernandez, Raul ; Rendel, Asaf ; Ramabhadran, Bhuvana ; Hoory, Ron

  • Author_Institution
    IBM TJ Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    2013
  • Firstpage
    6885
  • Lastpage
    6889
  • Abstract
    In this work we look at using non-parametric, exemplar-based regression for the prediction of prosodic contour targets from textual features in a speech synthesis system. We investigate the performance of Gaussian Process regression on this task when the covariance kernel operates on a variety of input feature spaces. In particular, we consider non-linear features extracted via Deep Belief Networks. We motivate the use of this hybrid model by considering the initial deep-layer model as a feature extractor that can summarize high-level structure from the raw inputs to improve the regression of an exemplar-based model in the second part of the approach. By looking at both objective metrics and perceptual listening tests, we evaluate these proposals against each other, and against the standard clustering-tree techniques implemented in parametric synthesis for the prediction of prosodic targets.
  • Keywords
    Gaussian processes; feature extraction; nonparametric statistics; regression analysis; speech synthesis; F0 contour prediction; Gaussian process regression; clustering-tree techniques; covariance kernel; deep belief network-Gaussian process hybrid model; deep-layer model; high-level structure; nonlinear feature extraction; nonparametric exemplar-based regression; objective metrics; parametric synthesis; perceptual listening tests; prosodic contour target prediction; speech synthesis system; Artificial neural networks; Context; Feature extraction; Gaussian processes; Hidden Markov models; Predictive models; Training; Gaussian processes; intonation generation; neural networks; speech synthesis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6638996
  • Filename
    6638996