مرکز منطقه ای اطلاع رساني علوم و فناوري - Two-stage audio-visual speech dereverberation and separation based on models of the interaural spatial cues and spatial covariance

DocumentCode :

641136

Title :

Two-stage audio-visual speech dereverberation and separation based on models of the interaural spatial cues and spatial covariance

Author :

Khan, M.S. ; Naqvi, Syed Mohsen ; Chambers, Jonathon

Author_Institution :

Adv. Signal Process. Group, Loughborough Univ., Loughborough, UK

fYear :

2013

fDate :

1-3 July 2013

Firstpage :

Lastpage :

Abstract :

This work presents a two-stage speech source separation algorithm based on combined models of interaural cues and spatial covariance which utilize knowledge of the locations of the sources estimated through video. In the first pre-processing stage the late reverberant speech components are suppressed by a spectral subtraction rule to dereverberate the observed mixture. In the second stage, the binaural spatial parameters, the interaural phase difference and the interaural level difference, and the spatial covariance are modeled in the short-time Fourier transform (STFT) domain to classify individual time-frequency (TF) units to each source. The parameters of these probabilistic models and the TF regions assigned to each source are updated with the expectation-maximization (EM) algorithm. The algorithm generates TF masks that are used to reconstruct the individual speech sources. Objective results, in terms of the signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ), confirm that the proposed multimodal method with pre-processing is a promising approach for source separation in highly reverberant rooms.

Keywords :

Fourier transforms; audio signal processing; expectation-maximisation algorithm; probability; source separation; speech processing; time-frequency analysis; EM algorithm; PESQ; STFT domain; binaural spatial parameters; expectation-maximization algorithm; interaural level difference; interaural phase difference; interaural spatial cues; late reverberant speech component suppression; multimodal method; perceptual evaluation; preprocessing stage; probabilistic models; reverberant rooms; short-time Fourier transform domain; signal-to-distortion ratio; spatial covariance; spectral subtraction rule; speech quality; speech source reconstruction; time-frequency unit classification; two-stage audio-visual speech dereverberation; two-stage audio-visual speech separation; two-stage speech source separation algorithm; Covariance matrices; Gain; Reverberation; Source separation; Speech; Time-frequency analysis; Vectors; Source separation; expectation-maximization; reverberation; spatial cues; time-frequency masking;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Digital Signal Processing (DSP), 2013 18th International Conference on

Conference_Location :

Fira

ISSN :

1546-1874

Type :

conf

DOI :

10.1109/ICDSP.2013.6622780

Filename :

6622780

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=641136