• DocumentCode
    312284
  • Title

    Goethe for prosody

  • Author

    Rapp, Stefan

  • Author_Institution
    Inst. fur Maschinelle Sprachverarbeitung, Stuttgart Univ., Germany
  • Volume
    3
  • fYear
    1996
  • fDate
    3-6 Oct 1996
  • Firstpage
    1636
  • Abstract
    We describe the way in which a recording of Goethe´s “Die Leiden des jungen Werther” published on a multimedia CD-ROM (J.W. Goethe, 1995) was made accessible for prosody research. The recording is interesting for prosody research because of its prosodic richness as it displays a large variety of registers and speaking styles. Application areas are: development of prosody models for German TTS, unsupervised learning of pitch accent types, corpus search for research on prosody semantics and prosody syntax interaction, and the study of more global prosodic parameters (speaking rate, pitch range) defining registers or speaking style. The four hour recording was segmented into phonemes, syllables and words using HMM speech recognition techniques (S. Rapp, 1995), together with a large pronunciation lexicon (R.H. Baayen et al., 1993). A part of speech tagger (H. Schmid, 1995) was applied to the corpus to yield time aligned POS tags. The German adaptation of the tone sequence model of intonation used in Stuttgart (J. Mayer, 1995; C. Fery, 1993) inspired the parametrization of fundamental frequency. An intermediate phonetic representation layer is described that uses the syllable alignment to parametrize the F0 contour into a superposition of three functions: a hyperbolic tangent, a Gaussian and a constant
  • Keywords
    hidden Markov models; multimedia computing; natural languages; speech processing; speech recognition; German TTS; German adaptation; HMM speech recognition techniques; corpus search; global prosodic parameters; intermediate phonetic representation layer; intonation; large pronunciation lexicon; multimedia CD-ROM; part of speech tagger; phonemes; pitch accent types; prosody research; prosody semantics; prosody syntax interaction; speaking style; speaking styles; syllable alignment; time aligned POS tags; tone sequence model; unsupervised learning; words; CD recording; CD-ROMs; Displays; Hidden Markov models; Natural languages; Read only memory; Speech recognition; Stress; Unsupervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    0-7803-3555-4
  • Type

    conf

  • DOI
    10.1109/ICSLP.1996.607938
  • Filename
    607938