A new speech synthesis system based on the ARX speech production model

Author

Zhu, Weizhong ; Kasuya, Hideki

Author_Institution

Fac. of Eng., Utsunomiya Univ., Japan

Volume

3

fYear

1996

fDate

3-6 Oct 1996

Firstpage

1413

Abstract

We present a new formant-type speech analysis-synthesis system based on the ARX (Auto-Regressive with Exogenous Input) speech production model. The model consists of cascade formant-antiformant synthesizers driven by a voicing source and an unvoiced turbulent noise source. One of the key features of the proposed method is that we have an algorithm to automatically measure the voicing source, unvoiced source and formant-antiformant parameters of the synthesizer directly from natural speech waveforms. After having automatically obtained estimates of the parameters from natural speech, one can manipulate the estimates using a flexible editing tool that has been developed as a part of the system. By changing values of the fundamental frequency, glottal open quotient, spectral tilt parameter, turbulent noise level, formant-antiformant frequencies and bandwidths, we can synthesize natural sounding speech with various voice qualities including modal, breathy, tense, and whisper voice. Acoustic correlates of these voice qualities could be systematically investigated using the proposed system. Since our analysis-editing-synthesis system has been developed on the MS-Windows platform, it is expected that it will be a useful tool in various basic areas of speech science and technology

Keywords

noise; parameter estimation; spectral analysis; speech processing; speech synthesis; statistical analysis; ARX speech production model; Auto-Regressive with Exogenous Input; MS-Windows; acoustic correlates; bandwidths; cascade formant-antiformant synthesizers; flexible editing tool; formant-type speech analysis system; fundamental frequency; glottal open quotient; natural sounding speech; natural speech; natural speech waveforms; parameter estimation; spectral tilt parameter; speech synthesis system; turbulent noise level; unvoiced turbulent noise source; voicing source; Acoustic noise; Bandwidth; Frequency synthesizers; Natural languages; Noise level; Parameter estimation; Production systems; Speech analysis; Speech enhancement; Speech synthesis;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location

Philadelphia, PA

Print_ISBN

0-7803-3555-4

Type

conf

DOI

10.1109/ICSLP.1996.607879

Filename

607879