DocumentCode
312236
Title
A new speech synthesis system based on the ARX speech production model
Author
Zhu, Weizhong ; Kasuya, Hideki
Author_Institution
Fac. of Eng., Utsunomiya Univ., Japan
Volume
3
fYear
1996
fDate
3-6 Oct 1996
Firstpage
1413
Abstract
We present a new formant-type speech analysis-synthesis system based on the ARX (Auto-Regressive with Exogenous Input) speech production model. The model consists of cascade formant-antiformant synthesizers driven by a voicing source and an unvoiced turbulent noise source. One of the key features of the proposed method is that we have an algorithm to automatically measure the voicing source, unvoiced source and formant-antiformant parameters of the synthesizer directly from natural speech waveforms. After having automatically obtained estimates of the parameters from natural speech, one can manipulate the estimates using a flexible editing tool that has been developed as a part of the system. By changing values of the fundamental frequency, glottal open quotient, spectral tilt parameter, turbulent noise level, formant-antiformant frequencies and bandwidths, we can synthesize natural sounding speech with various voice qualities including modal, breathy, tense, and whisper voice. Acoustic correlates of these voice qualities could be systematically investigated using the proposed system. Since our analysis-editing-synthesis system has been developed on the MS-Windows platform, it is expected that it will be a useful tool in various basic areas of speech science and technology
Keywords
noise; parameter estimation; spectral analysis; speech processing; speech synthesis; statistical analysis; ARX speech production model; Auto-Regressive with Exogenous Input; MS-Windows; acoustic correlates; bandwidths; cascade formant-antiformant synthesizers; flexible editing tool; formant-type speech analysis system; fundamental frequency; glottal open quotient; natural sounding speech; natural speech; natural speech waveforms; parameter estimation; spectral tilt parameter; speech synthesis system; turbulent noise level; unvoiced turbulent noise source; voicing source; Acoustic noise; Bandwidth; Frequency synthesizers; Natural languages; Noise level; Parameter estimation; Production systems; Speech analysis; Speech enhancement; Speech synthesis;
fLanguage
English
Publisher
ieee
Conference_Titel
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location
Philadelphia, PA
Print_ISBN
0-7803-3555-4
Type
conf
DOI
10.1109/ICSLP.1996.607879
Filename
607879
Link To Document