Speech re-synthesis from spectrogram image through sinusoidal modelling

Author

Garg, Mayank ; Singhal, Roshani

Author_Institution

Electr. & Electron. Dept., Birla Inst. of Technol. & Sci., Pilani, India

fYear

2014

fDate

24-27 Sept. 2014

Firstpage

2757

Lastpage

2761

Abstract

A novel method to extract parameters i.e. frequencies and their bandwidth for intelligible speech synthesis is presented in the paper. The parameters are extracted from the spectrogram image of the pre-recorded male and female voice samples and used to re-synthesize speech by employing sinusoidal signals. The phase continuity is preserved by quantifying time-scale and identifying phase at temporal boundaries for a given frequency. The amplitude distribution of the sinusoidals follow Gaussian distribution and use frequency overlap to extend the bandwidth from 4 kHz to 6 kHz for the improvement in clarity of synthesized speech. The synthesized speech is further passed through a weighting filter to improve the envelope of re-synthesized time-domain signal. The synthesized speech is synthetic but noticeably intelligible.

Keywords

Gaussian distribution; filtering theory; speech synthesis; time-domain analysis; Gaussian distribution; amplitude distribution; frequency 6 kHz; frequency overlap; intelligible speech synthesis; parameter extraction; phase continuity; sinusoidal modelling; sinusoidal signals; spectrogram image; speech resynthesis; time-domain signal resynthesis; time-scale quantification; weighting filter; Bayes methods; Gaussian filter; intelligible speech synthesis; parameter extraction; sinusoidal synthesis; synthetic speech;

fLanguage

English

Publisher

ieee

Conference_Titel

Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on

Conference_Location

New Delhi

Print_ISBN

978-1-4799-3078-4

Type

conf

DOI

10.1109/ICACCI.2014.6968501

Filename

6968501