• DocumentCode
    730696
  • Title

    Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis

  • Author

    Heiga Zen ; Sak, Hasim

  • Author_Institution
    Google Inc., Mountain View, CA, USA
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4470
  • Lastpage
    4474
  • Abstract
    Long short-term memory recurrent neural networks (LSTM-RNNs) have been applied to various speech applications including acoustic modeling for statistical parametric speech synthesis. One of the concerns for applying them to text-to-speech applications is its effect on latency. To address this concern, this paper proposes a low-latency, streaming speech synthesis architecture using unidirectional LSTM-RNNs with a recurrent output layer. The use of unidirectional RNN architecture allows frame-synchronous streaming inference of output acoustic features given input linguistic features. The recurrent output layer further encourages smooth transition between acoustic features at consecutive frames. Experimental results in subjective listening tests show that the proposed architecture can synthesize natural sounding speech without requiring utterance-level batch processing.
  • Keywords
    inference mechanisms; recurrent neural nets; speech synthesis; statistical analysis; acoustic modeling; consecutive frames; frame-synchronous streaming inference; input linguistic features; low-latency speech synthesis; output acoustic features; recurrent output layer; statistical parametric speech synthesis; streaming speech synthesis architecture; subjective listening tests; text-to-speech applications; unidirectional LSTM-RNN; unidirectional long short-term memory recurrent neural network; Acoustics; Hidden Markov models; Neural networks; Pragmatics; Smoothing methods; Speech; Speech synthesis; Statistical parametric speech synthesis; long short-term memory; low-latency; recurrent neural networks;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178816
  • Filename
    7178816