DocumentCode
730614
Title
Variational EM for clustering interaural phase cues in MESSL for blind source separation of speech
Author
Zohny, Zeinab ; Naqvi, Syed Mohsen ; Chambers, Jonathon A.
Author_Institution
Adv. Signal Process. Group, Loughborough Univ., Loughborough, UK
fYear
2015
fDate
19-24 April 2015
Firstpage
3966
Lastpage
3970
Abstract
The model-based expectation maximization source separation and localization (MESSL) technique is a probabilistic time-frequency masking algorithm that achieves underdetermined blind source separation of speech sources. Using only two-channel recordings, MESSL clusters spectrogram points based on their interaural spatial cues. Gaussian mixture models (GMMs) are assumed for the interaural cues and their corresponding parameters are determined by maximum likelihood estimation (MLE) via the expectation maximization (EM) framework. However, the presence of singularities and over-fitting are major drawbacks of MLE. In this paper, we investigate variational Bayesian (VB) inference for clustering spectrogram points based particularly on their interaural phase difference (IPD) cues. Variational inference overcomes the difficulties associated with the likelihood optimization and improves the separation especially when the sources are in close proximity. Simulation studies based on speech mixtures formed from the TIMIT database confirm the advantage of the proposed approach in terms of signal to distortion ratio (SDR).
Keywords
Bayes methods; Gaussian processes; audio databases; audio signal processing; blind source separation; expectation-maximisation algorithm; inference mechanisms; mixture models; optimisation; speech processing; EM framework; GMM; Gaussian mixture models; IPD cues; MESSL cluster spectrogram; MESSL technique; MLE; SDR; TIMIT database; VB inference; blind source separation; interaural phase cues; interaural spatial cues; likelihood optimization; maximum likelihood estimation; model-based expectation maximization source separation and localization; probabilistic time-frequency masking algorithm; signal to distortion ratio; speech mixtures; speech sources; two-channel recordings; variational Bayesian inference; Bayes methods; Blind source separation; Maximum likelihood estimation; Nickel; Spectrogram; Speech; Speech processing; Blind source separation; Gaussian mixture models; expectation-maximization; time-frequency masking; variational Bayesian inference;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location
South Brisbane, QLD
Type
conf
DOI
10.1109/ICASSP.2015.7178715
Filename
7178715
Link To Document