• DocumentCode
    3234328
  • Title

    A first study on neural net based generation of prosodic and spectral information for Mandarin text-to-speech

  • Author

    Sin-Horng Chen ; Hwang, Shaw-IIwa ; Tsai, Chun-Yu

  • Author_Institution
    Dept. of Commun. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
  • Volume
    2
  • fYear
    1992
  • fDate
    23-26 Mar 1992
  • Firstpage
    45
  • Abstract
    A neural-network-based approach to generating prosodic and spectral information of syllables for Mandarin text-to-speech synthesis is studied. Some contextual features are first extracted from a given input text by text analysis and taken as input signals for synthesis. Then, six multilayer perceptrons are employed to generate pause duration, syllable duration, and pitch mean and shape of one- and two-syllable synthesis units, several reproduction templates of proper size are first generated for each synthesis unit of syllable approach. The objective is to generate spectral patterns of the syllable that can be directly concatenated to synthesize natural speech without further modification. The validity of this novel approach was examined by simulation using a database of sentential utterances recorded from TV news, reported by a single female announcer. Experimental results confirmed that this is a promising approach for Mandarin text-to-speech synthesis
  • Keywords
    feedforward neural nets; natural languages; speech synthesis; Mandarin text-to-speech; TV news; contextual features; female speech; multilayer perceptrons; natural speech; neural net based generation; one-syllable synthesis units; pause duration; pitch mean; prosodic information; sentential utterances; simulation; spectral information; spectral patterns; syllable duration; two-syllable synthesis units; Concatenated codes; Data mining; Feature extraction; Multilayer perceptrons; Natural languages; Neural networks; Shape; Signal synthesis; Speech synthesis; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on
  • Conference_Location
    San Francisco, CA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-0532-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.1992.226124
  • Filename
    226124