Variational EM for clustering interaural phase cues in MESSL for blind source separation of speech

Author

Zohny, Zeinab ; Naqvi, Syed Mohsen ; Chambers, Jonathon A.

Author_Institution

Adv. Signal Process. Group, Loughborough Univ., Loughborough, UK

fYear

2015

fDate

19-24 April 2015

Firstpage

3966

Lastpage

3970

Abstract

The model-based expectation maximization source separation and localization (MESSL) technique is a probabilistic time-frequency masking algorithm that achieves underdetermined blind source separation of speech sources. Using only two-channel recordings, MESSL clusters spectrogram points based on their interaural spatial cues. Gaussian mixture models (GMMs) are assumed for the interaural cues and their corresponding parameters are determined by maximum likelihood estimation (MLE) via the expectation maximization (EM) framework. However, the presence of singularities and over-fitting are major drawbacks of MLE. In this paper, we investigate variational Bayesian (VB) inference for clustering spectrogram points based particularly on their interaural phase difference (IPD) cues. Variational inference overcomes the difficulties associated with the likelihood optimization and improves the separation especially when the sources are in close proximity. Simulation studies based on speech mixtures formed from the TIMIT database confirm the advantage of the proposed approach in terms of signal to distortion ratio (SDR).

Keywords

Bayes methods; Gaussian processes; audio databases; audio signal processing; blind source separation; expectation-maximisation algorithm; inference mechanisms; mixture models; optimisation; speech processing; EM framework; GMM; Gaussian mixture models; IPD cues; MESSL cluster spectrogram; MESSL technique; MLE; SDR; TIMIT database; VB inference; blind source separation; interaural phase cues; interaural spatial cues; likelihood optimization; maximum likelihood estimation; model-based expectation maximization source separation and localization; probabilistic time-frequency masking algorithm; signal to distortion ratio; speech mixtures; speech sources; two-channel recordings; variational Bayesian inference; Bayes methods; Blind source separation; Maximum likelihood estimation; Nickel; Spectrogram; Speech; Speech processing; Blind source separation; Gaussian mixture models; expectation-maximization; time-frequency masking; variational Bayesian inference;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location

South Brisbane, QLD

Type

conf

DOI

10.1109/ICASSP.2015.7178715

Filename

7178715