Goethe for prosody

Author

Rapp, Stefan

Author_Institution

Inst. fur Maschinelle Sprachverarbeitung, Stuttgart Univ., Germany

Volume

3

fYear

1996

fDate

3-6 Oct 1996

Firstpage

1636

Abstract

We describe the way in which a recording of Goethe´s “Die Leiden des jungen Werther” published on a multimedia CD-ROM (J.W. Goethe, 1995) was made accessible for prosody research. The recording is interesting for prosody research because of its prosodic richness as it displays a large variety of registers and speaking styles. Application areas are: development of prosody models for German TTS, unsupervised learning of pitch accent types, corpus search for research on prosody semantics and prosody syntax interaction, and the study of more global prosodic parameters (speaking rate, pitch range) defining registers or speaking style. The four hour recording was segmented into phonemes, syllables and words using HMM speech recognition techniques (S. Rapp, 1995), together with a large pronunciation lexicon (R.H. Baayen et al., 1993). A part of speech tagger (H. Schmid, 1995) was applied to the corpus to yield time aligned POS tags. The German adaptation of the tone sequence model of intonation used in Stuttgart (J. Mayer, 1995; C. Fery, 1993) inspired the parametrization of fundamental frequency. An intermediate phonetic representation layer is described that uses the syllable alignment to parametrize the F₀ contour into a superposition of three functions: a hyperbolic tangent, a Gaussian and a constant

Keywords

hidden Markov models; multimedia computing; natural languages; speech processing; speech recognition; German TTS; German adaptation; HMM speech recognition techniques; corpus search; global prosodic parameters; intermediate phonetic representation layer; intonation; large pronunciation lexicon; multimedia CD-ROM; part of speech tagger; phonemes; pitch accent types; prosody research; prosody semantics; prosody syntax interaction; speaking style; speaking styles; syllable alignment; time aligned POS tags; tone sequence model; unsupervised learning; words; CD recording; CD-ROMs; Displays; Hidden Markov models; Natural languages; Read only memory; Speech recognition; Stress; Unsupervised learning;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location

Philadelphia, PA

Print_ISBN

0-7803-3555-4

Type

conf

DOI

10.1109/ICSLP.1996.607938

Filename

607938