Title :
Two-Dimensional Speech-Signal Modeling
Author :
Wang, Tianyu T. ; Quatieri, Thomas F.
Author_Institution :
Lincoln Lab., Massachusetts Inst. of Technol. (MIT), Lexington, MA, USA
Abstract :
Traditional approaches in speech-signal processing analyze short-time frames of the signal (e.g., the short-time Fourier transform). Findings from auditory neurophysiology coupled with image processing principles, however, have motivated an alternative 2-D processing framework in which 2-D analysis is performed on the time-frequency distribution itself. This paper develops a 2-D model of speech in local time-frequency regions of narrowband spectrograms using sinusoidal-series-based modulation. Our model is shown to distribute vocal tract and onset/offset content based on source information (e.g., noise and voicing) in a transformed 2-D space, thereby explicitly representing different classes of energy modulations commonly observed in spectrograms. We demonstrate the model´s ability to represent speech sounds by developing and evaluating algorithms for analysis/synthesis of spectrograms. As an example application, we demonstrate the utility of the model for co-channel speaker separation using prior pitch information of two overlapping speakers. Finally, our separation scheme based on 2-D modeling is compared against a reference (frame-based) sinusoidal separation system using both prior and estimated pitch.
Keywords :
speaker recognition; speech processing; time-frequency analysis; 2-D processing framework; auditory neurophysiology; cochannel speaker separation; image processing principles; onset-offset content; pitch information; sinusoidal-series-based modulation; source information; time-frequency on distribution; two-dimensional speech-signal processing modeling; vocal tract; Analytical models; Frequency modulation; Spectrogram; Speech; Speech processing; Time frequency analysis; 2-D processing of speech; Grating Compression Transform (GCT); co-channel speaker separation; spectrogram reconstruction; spectrotemporal modulations;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2012.2188795