DocumentCode
166245
Title
Speech re-synthesis from spectrogram image through sinusoidal modelling
Author
Garg, Mayank ; Singhal, Roshani
Author_Institution
Electr. & Electron. Dept., Birla Inst. of Technol. & Sci., Pilani, India
fYear
2014
fDate
24-27 Sept. 2014
Firstpage
2757
Lastpage
2761
Abstract
A novel method to extract parameters i.e. frequencies and their bandwidth for intelligible speech synthesis is presented in the paper. The parameters are extracted from the spectrogram image of the pre-recorded male and female voice samples and used to re-synthesize speech by employing sinusoidal signals. The phase continuity is preserved by quantifying time-scale and identifying phase at temporal boundaries for a given frequency. The amplitude distribution of the sinusoidals follow Gaussian distribution and use frequency overlap to extend the bandwidth from 4 kHz to 6 kHz for the improvement in clarity of synthesized speech. The synthesized speech is further passed through a weighting filter to improve the envelope of re-synthesized time-domain signal. The synthesized speech is synthetic but noticeably intelligible.
Keywords
Gaussian distribution; filtering theory; speech synthesis; time-domain analysis; Gaussian distribution; amplitude distribution; frequency 6 kHz; frequency overlap; intelligible speech synthesis; parameter extraction; phase continuity; sinusoidal modelling; sinusoidal signals; spectrogram image; speech resynthesis; time-domain signal resynthesis; time-scale quantification; weighting filter; Bayes methods; Gaussian filter; intelligible speech synthesis; parameter extraction; sinusoidal synthesis; synthetic speech;
fLanguage
English
Publisher
ieee
Conference_Titel
Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on
Conference_Location
New Delhi
Print_ISBN
978-1-4799-3078-4
Type
conf
DOI
10.1109/ICACCI.2014.6968501
Filename
6968501
Link To Document