• DocumentCode
    730614
  • Title

    Variational EM for clustering interaural phase cues in MESSL for blind source separation of speech

  • Author

    Zohny, Zeinab ; Naqvi, Syed Mohsen ; Chambers, Jonathon A.

  • Author_Institution
    Adv. Signal Process. Group, Loughborough Univ., Loughborough, UK
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    3966
  • Lastpage
    3970
  • Abstract
    The model-based expectation maximization source separation and localization (MESSL) technique is a probabilistic time-frequency masking algorithm that achieves underdetermined blind source separation of speech sources. Using only two-channel recordings, MESSL clusters spectrogram points based on their interaural spatial cues. Gaussian mixture models (GMMs) are assumed for the interaural cues and their corresponding parameters are determined by maximum likelihood estimation (MLE) via the expectation maximization (EM) framework. However, the presence of singularities and over-fitting are major drawbacks of MLE. In this paper, we investigate variational Bayesian (VB) inference for clustering spectrogram points based particularly on their interaural phase difference (IPD) cues. Variational inference overcomes the difficulties associated with the likelihood optimization and improves the separation especially when the sources are in close proximity. Simulation studies based on speech mixtures formed from the TIMIT database confirm the advantage of the proposed approach in terms of signal to distortion ratio (SDR).
  • Keywords
    Bayes methods; Gaussian processes; audio databases; audio signal processing; blind source separation; expectation-maximisation algorithm; inference mechanisms; mixture models; optimisation; speech processing; EM framework; GMM; Gaussian mixture models; IPD cues; MESSL cluster spectrogram; MESSL technique; MLE; SDR; TIMIT database; VB inference; blind source separation; interaural phase cues; interaural spatial cues; likelihood optimization; maximum likelihood estimation; model-based expectation maximization source separation and localization; probabilistic time-frequency masking algorithm; signal to distortion ratio; speech mixtures; speech sources; two-channel recordings; variational Bayesian inference; Bayes methods; Blind source separation; Maximum likelihood estimation; Nickel; Spectrogram; Speech; Speech processing; Blind source separation; Gaussian mixture models; expectation-maximization; time-frequency masking; variational Bayesian inference;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178715
  • Filename
    7178715