• DocumentCode
    730155
  • Title

    Representation models in single channel source separation

  • Author

    Zohrer, Matthias ; Pernkopf, Franz

  • Author_Institution
    Signal Process. & Speech Commun. Lab., Graz Univ. of Technol., Graz, Austria
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    713
  • Lastpage
    717
  • Abstract
    Model-based single-channel source separation (SCSS) is an ill-posed problem requiring source-specific prior knowledge. In this paper, we use representation learning and compare general stochastic networks (GSNs), Gauss Bernoulli restricted Boltzmann machines (GBRBMs), conditional Gauss Bernoulli restricted Boltzmann machines (CGBRBMs), and higher order contractive autoencoders (HCAEs) for modeling the source-specific knowledge. In particular, these models learn a mapping from speech mixture spectrogram representations to single-source spectrogram representations, i.e. we apply them as filter for the speech mixture. In the test case, the individual source spectrograms of both models are inferred and the softmask for re-synthesis of the time signals is determined thereof. We evaluate the deep architectures on data of the 2nd CHiME speech separation challenge and provide results for a speaker dependent, a speaker independent, a matched noise condition and an unmatched noise condition task. Our experiments show the best PESQ and overall perceptual score on average for GSNs in all four tasks.
  • Keywords
    Boltzmann machines; acoustic filters; acoustic noise; acoustic radiators; acoustic signal processing; signal representation; source separation; speech coding; speech synthesis; stochastic processes; 2nd CHiME speech separation; Gauss Bernoulli restricted Boltzmann machines; conditional Gauss Bernoulli restricted Boltzmann machines; filter; general stochastic networks; higher order contractive autoencoders; matched noise condition; model-based single-channel source separation; perceptual score; representation learning; representation models; single-source spectrogram representations; source-specific knowledge; speech mixture spectrogram representation; time signal resynthesis; unmatched noise condition task; Data models; Noise; Source separation; Spectrogram; Speech; Stochastic processes; Training; deep neural networks; general stochastic network; representation models; single channel source separation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178062
  • Filename
    7178062