• DocumentCode
    1548022
  • Title

    Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model

  • Author

    George, E. Bryan ; Smith, Mark J T

  • Author_Institution
    Signal Process. Center of Technol., Lockheed-Martin Inc., Nashua, NH, USA
  • Volume
    5
  • Issue
    5
  • fYear
    1997
  • fDate
    9/1/1997 12:00:00 AM
  • Firstpage
    389
  • Lastpage
    406
  • Abstract
    Sinusoidal modeling has been successfully applied to a broad range of speech processing problems, and offers advantages over linear predictive modeling and the short-time Fourier transform for speech analysis/synthesis and modification. This paper presents a novel speech analysis/synthesis system based on the combination of an overlap-add sinusoidal model with an analysis-by-synthesis technique to determine the model parameters. It describes this analysis procedure in detail, and introduces an equivalent frequency-domain algorithm that takes advantage of the computational efficiency of the fast Fourier transform (FFT). In addition, a refined overlap-add sinusoidal model capable of shape-invariant speech modification is derived, and a pitch-scale modification algorithm is defined that preserves speech bandwidth and eliminates noise migration effects. Analysis-by-synthesis achieves very high synthetic speech quality by accurately estimating the component frequencies, eliminating sidelobe interference effects, and effectively dealing with nonstationary speech events. The refined overlap-add synthesis model correlates well with analysis-by-synthesis, and modifies speech without objectionable artifacts by explicitly controlling shape invariance and phase coherence. The proposed analysis-by-synthesis/overlap-add (ABS/OLA) system allows for both fixed and time-varying time-, frequency-, and pitch-scale modifications, and computational shortcuts using the FFT algorithm make its implementation feasible using currently available hardware
  • Keywords
    correlation methods; fast Fourier transforms; frequency estimation; speech intelligibility; speech processing; speech synthesis; FFT algorithm; analysis by synthesis model; computational efficiency; correlation; fast Fourier transform; frequency estimation; frequency-domain algorithm; model parameters; nonstationary speech events; overlap-add sinusoidal model; overlap-add synthesis model; phase coherence; pitch scale modification algorithm; shape invariant speech modification; sidelobe interference effects; sinusoidal modeling; speech analysis/synthesis system; speech bandwidth; speech processing; synthetic speech quality; time varying frequency scale modification; time varying time scale modification; Algorithm design and analysis; Fourier transforms; Frequency domain analysis; Frequency estimation; Predictive models; Shape control; Speech analysis; Speech enhancement; Speech processing; Speech synthesis;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/89.622558
  • Filename
    622558