DocumentCode :
1504741
Title :
Towards Interpretive Models for 2-D Processing of Speech
Author :
Wang, Tianyu T. ; Quatieri, Thomas F.
Author_Institution :
Lincoln Lab., Massachusetts Inst. of Technol., Lexington, MA, USA
Volume :
20
Issue :
7
fYear :
2012
Firstpage :
2159
Lastpage :
2173
Abstract :
This paper considers 2-D Fourier analysis of local time-frequency regions of wideband spectrograms, a representation we refer to as the wideband Grating Compression Transform (WGCT). We develop frequency-dependent, speech-production-based models of speech signals for the WGCT, building on previous work in modeling narrowband-based GCT representations (NGCT). Comparisons show important distinctions, including dual behavior, between the wideband and narrowband models, and distinct ways in which vocal tract/formant content is distributed redundantly throughout the NGCT and WGCT spaces. Our results motivate a novel taxonomy of speech-signal behavior as an interpretative framework (i.e., in relation to speech-production characteristics) for 2-D processing of speech using the GCT, as well as for other 2-D approaches and time-frequency distributions such as the auditory spectrogram. We demonstrate and evaluate the ability of the model to represent real speech content through demodulation techniques for analysis/synthesis of wideband spectrograms. Finally, we develop a co-channel speaker separation method, using prior and estimated pitch information, based on the WGCT, as well as through fusion with the NGCT. These GCT-based separation systems are compared against and further fused with a reference sinusoidal separation system.
Keywords :
Fourier analysis; demodulation; speech processing; time-frequency analysis; 2D Fourier analysis; 2D speech processing; auditory spectrogram; cochannel speaker separation; demodulation techniques; frequency-dependent speech-production-based models; interpretive models; local time-frequency regions; narrowband-based GCT representations; sinusoidal separation system; speech signals; time-frequency distributions; wideband grating compression transform; wideband spectrograms; Modulation; Narrowband; Spectrogram; Speech; Speech processing; Time frequency analysis; Wideband; 2-D processing of speech; Grating Compression Transform; co-channel speaker separation; spectrogram reconstruction; wideband spectrogram;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2012.2194282
Filename :
6191312
Link To Document :
بازگشت