• DocumentCode
    2470
  • Title

    Symbolic Modeling of Prosody: From Linguistics to Statistics

  • Author

    Obin, Nicolas ; Lanchantin, Pierre

  • Author_Institution
    UPMC, Paris, France
  • Volume
    23
  • Issue
    3
  • fYear
    2015
  • fDate
    Mar-15
  • Firstpage
    588
  • Lastpage
    599
  • Abstract
    The assignment of prosodic events (accent and phrasing) from the text is crucial in text-to-speech synthesis systems. This paper addresses the combination of linguistic and metric constraints for the assignment of prosodic events in text-to-speech synthesis. First, a linguistic processing chain is used to provide a rich linguistic description of a text. Then, a novel statistical representation based on a hierarchical HMM (HHMM) is used to model the prosodic structure of a text: the root layer represents the text, each intermediate layer a sequence of intermediate phrases, the pre-terminal layer the sequence of accents, and the terminal layer the sequence of linguistic contexts. For each intermediate layer, a segmental HMM and information fusion are used to fuse the linguistic and metric constraints for the segmentation of a text into phrases. A set of experiments conducted on multi-speaker databases with various speaking styles reports that: the rich linguistic representation improves drastically the assignment of prosodic events, and the fusion of linguistic and metric constraints significantly improves over standard methods for the segmentation of a text into phrases. These constitute substantial advances that can be further used to model the speech prosody of a speaker, a speaking style, and emotions for text-to-speech synthesis.
  • Keywords
    hidden Markov models; linguistics; speech synthesis; accent event; hierarchical HMM; hierarchical hidden Markov model; information fusion; linguistic constraint; linguistic description; linguistic processing chain; linguistics; metric constraint; phrasing event; prosodic events; prosody symbolic modeling; segmental HMM; statistical representation; statistics; text prosodic structure; text segmentation; text-to-speech synthesis system; Context; Hidden Markov models; Measurement; Pragmatics; Speech; Speech processing; Syntactics; Dempster-Shafer fusion; hierarchical HMMs; prosodic events; segmental HMMs; speaking style; speech prosody; surface/deep syntactic parsing; text-to-speech synthesis;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2014.2387389
  • Filename
    7001250