مرکز منطقه ای اطلاع رساني علوم و فناوري - Integrating binaural cues and blind source separation method for separating reverberant speech mixtures

DocumentCode :

2149818

Title :

Integrating binaural cues and blind source separation method for separating reverberant speech mixtures

Author :

Alinaghi, Atiyeh ; Wang, Wenwu ; Jackson, Philip J B

Author_Institution :

Dept. of Electron. Eng. (FEPS), Univ. of Surrey, Guildford, UK

fYear :

2011

fDate :

22-27 May 2011

Firstpage :

209

Lastpage :

212

Abstract :

This paper presents a new method for reverberant speech separation, based on the combination of binaural cues and blind source separation (BSS) for the automatic classification of the time-frequency (T-F) units of the speech mixture spectrogram. The main idea is to model interaural phase difference, interaural level difference and frequency bin-wise mixing vectors by Gaussian mixture models for each source and then evaluate that model at each T-F point and assign the units with high probability to that source. The model parameters and the assigned regions are refined iteratively using the Expectation-Maximization (EM) algorithm. The proposed method also addresses the permutation problem of the frequency domain BSS by initializing the mixing vectors for each frequency channel. The EM algorithm starts with binaural cues and after a few iterations the estimated probabilistic mask is used to initialize and re-estimate the mixing vector model parameters. We performed experiments on speech mixtures, and showed an average of about 0.8 dB improvement in signal-to-distortion (SDR) over the binaural only baseline.

Keywords :

Gaussian processes; blind source separation; expectation-maximisation algorithm; probability; reverberation; signal classification; speech intelligibility; speech processing; EM algorithm; Gaussian mixture model; T-F point; automatic classification; binaural cues; blind source separation; estimated probabilistic mask; expectation-maximization algorithm; frequency bin-wise mixing vector; frequency channel; frequency domain BSS; interaural level difference; interaural phase difference; mixing vector model parameter; permutation problem; reverberant speech mixture; reverberant speech separation; signal-to-distortion; speech mixture spectrogram; time-frequency unit; Blind source separation; Frequency domain analysis; Microphones; Spectrogram; Speech; Speech processing; EM algorithm; blind source separation; interaural level difference; interaural phase difference; mixing vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on

Conference_Location :

Prague

ISSN :

1520-6149

Print_ISBN :

978-1-4577-0538-0

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2011.5946377

Filename :

5946377

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2149818