• DocumentCode
    1504032
  • Title

    Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment

  • Author

    Sawada, Hiroshi ; Araki, Shoko ; Makino, Shoji

  • Author_Institution
    NTT Commun. Sci. Labs., NTT Corp., Kyoto, Japan
  • Volume
    19
  • Issue
    3
  • fYear
    2011
  • fDate
    3/1/2011 12:00:00 AM
  • Firstpage
    516
  • Lastpage
    527
  • Abstract
    This paper presents a blind source separation method for convolutive mixtures of speech/audio sources. The method can even be applied to an underdetermined case where there are fewer microphones than sources. The separation operation is performed in the frequency domain and consists of two stages. In the first stage, frequency-domain mixture samples are clustered into each source by an expectation-maximization (EM) algorithm. Since the clustering is performed in a frequency bin-wise manner, the permutation ambiguities of the bin-wise clustered samples should be aligned. This is solved in the second stage by using the probability on how likely each sample belongs to the assigned class. This two-stage structure makes it possible to attain a good separation even under reverberant conditions. Experimental results for separating four speech signals with three microphones under reverberant conditions show the superiority of the new method over existing methods. We also report separation results for a benchmark data set and live recordings of speech mixtures.
  • Keywords
    audio signal processing; blind source separation; expectation-maximisation algorithm; microphones; reverberation; expectation-maximization algorithm; frequency bin-wise clustering; frequency-domain mixture samples; microphones; permutation alignment; permutation ambiguities; speech mixtures; speech-audio sources; underdetermined convolutive blind source separation; Acoustic applications; Acoustic sensors; Blind source separation; Fourier transforms; Microphones; Nonlinear filters; Reverberation; Source separation; Speech; Time frequency analysis; Blind source separation (BSS); convolutive mixture; expectation–maximization (EM) algorithm; permutation problem; short-time Fourier transform (STFT); sparseness; time–frequency (T–F) masking;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2010.2051355
  • Filename
    5473129