DocumentCode :
641136
Title :
Two-stage audio-visual speech dereverberation and separation based on models of the interaural spatial cues and spatial covariance
Author :
Khan, M.S. ; Naqvi, Syed Mohsen ; Chambers, Jonathon
Author_Institution :
Adv. Signal Process. Group, Loughborough Univ., Loughborough, UK
fYear :
2013
fDate :
1-3 July 2013
Firstpage :
1
Lastpage :
6
Abstract :
This work presents a two-stage speech source separation algorithm based on combined models of interaural cues and spatial covariance which utilize knowledge of the locations of the sources estimated through video. In the first pre-processing stage the late reverberant speech components are suppressed by a spectral subtraction rule to dereverberate the observed mixture. In the second stage, the binaural spatial parameters, the interaural phase difference and the interaural level difference, and the spatial covariance are modeled in the short-time Fourier transform (STFT) domain to classify individual time-frequency (TF) units to each source. The parameters of these probabilistic models and the TF regions assigned to each source are updated with the expectation-maximization (EM) algorithm. The algorithm generates TF masks that are used to reconstruct the individual speech sources. Objective results, in terms of the signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ), confirm that the proposed multimodal method with pre-processing is a promising approach for source separation in highly reverberant rooms.
Keywords :
Fourier transforms; audio signal processing; expectation-maximisation algorithm; probability; source separation; speech processing; time-frequency analysis; EM algorithm; PESQ; STFT domain; binaural spatial parameters; expectation-maximization algorithm; interaural level difference; interaural phase difference; interaural spatial cues; late reverberant speech component suppression; multimodal method; perceptual evaluation; preprocessing stage; probabilistic models; reverberant rooms; short-time Fourier transform domain; signal-to-distortion ratio; spatial covariance; spectral subtraction rule; speech quality; speech source reconstruction; time-frequency unit classification; two-stage audio-visual speech dereverberation; two-stage audio-visual speech separation; two-stage speech source separation algorithm; Covariance matrices; Gain; Reverberation; Source separation; Speech; Time-frequency analysis; Vectors; Source separation; expectation-maximization; reverberation; spatial cues; time-frequency masking;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Signal Processing (DSP), 2013 18th International Conference on
Conference_Location :
Fira
ISSN :
1546-1874
Type :
conf
DOI :
10.1109/ICDSP.2013.6622780
Filename :
6622780
Link To Document :
بازگشت