Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model

Author

George, E. Bryan ; Smith, Mark J T

Author_Institution

Signal Process. Center of Technol., Lockheed-Martin Inc., Nashua, NH, USA

Volume

5

Issue

5

fYear

1997

fDate

9/1/1997 12:00:00 AM

Firstpage

389

Lastpage

406

Abstract

Sinusoidal modeling has been successfully applied to a broad range of speech processing problems, and offers advantages over linear predictive modeling and the short-time Fourier transform for speech analysis/synthesis and modification. This paper presents a novel speech analysis/synthesis system based on the combination of an overlap-add sinusoidal model with an analysis-by-synthesis technique to determine the model parameters. It describes this analysis procedure in detail, and introduces an equivalent frequency-domain algorithm that takes advantage of the computational efficiency of the fast Fourier transform (FFT). In addition, a refined overlap-add sinusoidal model capable of shape-invariant speech modification is derived, and a pitch-scale modification algorithm is defined that preserves speech bandwidth and eliminates noise migration effects. Analysis-by-synthesis achieves very high synthetic speech quality by accurately estimating the component frequencies, eliminating sidelobe interference effects, and effectively dealing with nonstationary speech events. The refined overlap-add synthesis model correlates well with analysis-by-synthesis, and modifies speech without objectionable artifacts by explicitly controlling shape invariance and phase coherence. The proposed analysis-by-synthesis/overlap-add (ABS/OLA) system allows for both fixed and time-varying time-, frequency-, and pitch-scale modifications, and computational shortcuts using the FFT algorithm make its implementation feasible using currently available hardware

Keywords

correlation methods; fast Fourier transforms; frequency estimation; speech intelligibility; speech processing; speech synthesis; FFT algorithm; analysis by synthesis model; computational efficiency; correlation; fast Fourier transform; frequency estimation; frequency-domain algorithm; model parameters; nonstationary speech events; overlap-add sinusoidal model; overlap-add synthesis model; phase coherence; pitch scale modification algorithm; shape invariant speech modification; sidelobe interference effects; sinusoidal modeling; speech analysis/synthesis system; speech bandwidth; speech processing; synthetic speech quality; time varying frequency scale modification; time varying time scale modification; Algorithm design and analysis; Fourier transforms; Frequency domain analysis; Frequency estimation; Predictive models; Shape control; Speech analysis; Speech enhancement; Speech processing; Speech synthesis;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/89.622558

Filename

622558