Assessment and correction of voice quality variabilities in large speech databases for concatenative speech synthesis

Author

Stylianou, Yannis

Author_Institution

SIPS, AT&T Bell Labs., Florham Park, NJ, USA

Volume

1

fYear

1999

fDate

15-19 Mar 1999

Firstpage

377

Abstract

In an effort to increase the naturalness of concatenative speech synthesis, large speech databases may be recorded. While it is desirable to have varied prosodic and spectral characteristics in the database, it is not desirable to have variable voice quality. We present an automatic method for voice quality assessment and correction, whenever necessary, of large speech databases for concatenative speech synthesis. The proposed method is based on the use of a Gaussian mixture model, GMM, to model the acoustic space of the speaker of the database and on autoregressive filters for compensation. An objective method to measure the effectiveness of the database correction based on a likelihood function for the speaker´s GMM, is presented as well. Both objective and subjective results show that the proposed method succeeds in detecting voice quality problems and successfully corrects them. Results show a 14.2% improvement of the log-likelihood function after compensation

Keywords

Gaussian processes; autoregressive processes; filtering theory; spectral analysis; speech intelligibility; speech synthesis; Gaussian mixture model; acoustic space; automatic method; autoregressive filters; compensation; concatenative speech synthesis; database correction; large speech databases; likelihood function; listening tests; log-likelihood function; objective method; objective results; prosodic characteristics; spectral characteristics; subjective results; voice quality assessment; voice quality correction; voice quality variabilities; Acoustic measurements; Filters; Labeling; Loudspeakers; Quality assessment; Signal processing; Smoothing methods; Spatial databases; Speech synthesis; Synthesizers;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on

Conference_Location

Phoenix, AZ

ISSN

1520-6149

Print_ISBN

0-7803-5041-3

Type

conf

DOI

10.1109/ICASSP.1999.758141

Filename

758141