• DocumentCode
    1504741
  • Title

    Towards Interpretive Models for 2-D Processing of Speech

  • Author

    Wang, Tianyu T. ; Quatieri, Thomas F.

  • Author_Institution
    Lincoln Lab., Massachusetts Inst. of Technol., Lexington, MA, USA
  • Volume
    20
  • Issue
    7
  • fYear
    2012
  • Firstpage
    2159
  • Lastpage
    2173
  • Abstract
    This paper considers 2-D Fourier analysis of local time-frequency regions of wideband spectrograms, a representation we refer to as the wideband Grating Compression Transform (WGCT). We develop frequency-dependent, speech-production-based models of speech signals for the WGCT, building on previous work in modeling narrowband-based GCT representations (NGCT). Comparisons show important distinctions, including dual behavior, between the wideband and narrowband models, and distinct ways in which vocal tract/formant content is distributed redundantly throughout the NGCT and WGCT spaces. Our results motivate a novel taxonomy of speech-signal behavior as an interpretative framework (i.e., in relation to speech-production characteristics) for 2-D processing of speech using the GCT, as well as for other 2-D approaches and time-frequency distributions such as the auditory spectrogram. We demonstrate and evaluate the ability of the model to represent real speech content through demodulation techniques for analysis/synthesis of wideband spectrograms. Finally, we develop a co-channel speaker separation method, using prior and estimated pitch information, based on the WGCT, as well as through fusion with the NGCT. These GCT-based separation systems are compared against and further fused with a reference sinusoidal separation system.
  • Keywords
    Fourier analysis; demodulation; speech processing; time-frequency analysis; 2D Fourier analysis; 2D speech processing; auditory spectrogram; cochannel speaker separation; demodulation techniques; frequency-dependent speech-production-based models; interpretive models; local time-frequency regions; narrowband-based GCT representations; sinusoidal separation system; speech signals; time-frequency distributions; wideband grating compression transform; wideband spectrograms; Modulation; Narrowband; Spectrogram; Speech; Speech processing; Time frequency analysis; Wideband; 2-D processing of speech; Grating Compression Transform; co-channel speaker separation; spectrogram reconstruction; wideband spectrogram;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2012.2194282
  • Filename
    6191312