Towards Interpretive Models for 2-D Processing of Speech

Author

Wang, Tianyu T. ; Quatieri, Thomas F.

Author_Institution

Lincoln Lab., Massachusetts Inst. of Technol., Lexington, MA, USA

Volume

20

Issue

7

fYear

2012

Firstpage

2159

Lastpage

2173

Abstract

This paper considers 2-D Fourier analysis of local time-frequency regions of wideband spectrograms, a representation we refer to as the wideband Grating Compression Transform (WGCT). We develop frequency-dependent, speech-production-based models of speech signals for the WGCT, building on previous work in modeling narrowband-based GCT representations (NGCT). Comparisons show important distinctions, including dual behavior, between the wideband and narrowband models, and distinct ways in which vocal tract/formant content is distributed redundantly throughout the NGCT and WGCT spaces. Our results motivate a novel taxonomy of speech-signal behavior as an interpretative framework (i.e., in relation to speech-production characteristics) for 2-D processing of speech using the GCT, as well as for other 2-D approaches and time-frequency distributions such as the auditory spectrogram. We demonstrate and evaluate the ability of the model to represent real speech content through demodulation techniques for analysis/synthesis of wideband spectrograms. Finally, we develop a co-channel speaker separation method, using prior and estimated pitch information, based on the WGCT, as well as through fusion with the NGCT. These GCT-based separation systems are compared against and further fused with a reference sinusoidal separation system.

Keywords

Fourier analysis; demodulation; speech processing; time-frequency analysis; 2D Fourier analysis; 2D speech processing; auditory spectrogram; cochannel speaker separation; demodulation techniques; frequency-dependent speech-production-based models; interpretive models; local time-frequency regions; narrowband-based GCT representations; sinusoidal separation system; speech signals; time-frequency distributions; wideband grating compression transform; wideband spectrograms; Modulation; Narrowband; Spectrogram; Speech; Speech processing; Time frequency analysis; Wideband; 2-D processing of speech; Grating Compression Transform; co-channel speaker separation; spectrogram reconstruction; wideband spectrogram;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2012.2194282

Filename

6191312