• DocumentCode
    178424
  • Title

    A novel pitch decomposition method for the generalized linear alignment model

  • Author

    Langarani, Mahsa Sadat Elyasi ; Klabbers, Esther ; van Santen, Jan

  • Author_Institution
    Center for Spoken Language Understanding, Oregon Health & Sci. Univ., Portland, OR, USA
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    2584
  • Lastpage
    2588
  • Abstract
    Superpositional models of intonation typically propose decomposing fundamental frequency (F0) contours into phrase curves and accent curves, aligned with phrases and left-headed feet, respectively. Extracting these component curves from F0 contours without making undue assumptions is challenging. We propose a novel method for decomposing pitch curves, based on the assumption that accent curves can be described by combining skewed normal distributions and sigmoid functions. In contrast to an earlier pitch decomposition algorithm (“PRISM”), this allows for simple joint optimization of phrase and accent curve parameters, using fewer parameters. The proposed method was evaluated on three speech corpora containing: (1) synthetically generated pitch curves, (2) all-sonorant utterances, and (3) utterances containing both sonorant and non-sonorant speech sounds. The root weighted mean squared error is small, and, on the corpus for which comparable data are available, is significantly smaller than for PRISM.
  • Keywords
    mean square error methods; speech synthesis; text analysis; accent curves; all-sonorant utterances; component curves extraction; fundamental frequency contours; generalized linear alignment model; intonation; joint optimization; left-headed feet; nonsonorant speech sounds; phrase curves; pitch curves decomposition; root weighted mean squared error; sigmoid functions; skewed normal distributions; speech corpora; superpositional models; synthetically generated pitch curves; Conferences; Equations; Foot; Mathematical model; Protocols; Speech; Speech synthesis; prosody modeling; superpositional model; text-to-speech synthesis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854067
  • Filename
    6854067