DocumentCode :
1361508
Title :
Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation
Author :
Ozerov, Alexey ; Févotte, Cédric
Author_Institution :
CNRS LTCI, Telecom ParisTech, Paris, France
Volume :
18
Issue :
3
fYear :
2010
fDate :
3/1/2010 12:00:00 AM
Firstpage :
550
Lastpage :
563
Abstract :
We consider inference in a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. We work in the short-time Fourier transform (STFT) domain, where convolution is routinely approximated as linear instantaneous mixing in each frequency band. Each source STFT is given a model inspired from nonnegative matrix factorization (NMF) with the Itakura-Saito divergence, which underlies a statistical model of superimposed Gaussian components. We address estimation of the mixing and source parameters using two methods. The first one consists of maximizing the exact joint likelihood of the multichannel data using an expectation-maximization (EM) algorithm. The second method consists of maximizing the sum of individual likelihoods of all channels using a multiplicative update algorithm inspired from NMF methodology. Our decomposition algorithms are applied to stereo audio source separation in various settings, covering blind and supervised separation, music and speech sources, synthetic instantaneous and convolutive mixtures, as well as professionally produced music recordings. Our EM method produces competitive results with respect to state-of-the-art as illustrated on two tasks from the international Signal Separation Evaluation Campaign (SiSEC 2008).
Keywords :
Fourier transforms; audio signal processing; blind source separation; convolution; matrix decomposition; speech processing; statistical analysis; Fourier transform; Itakura-Saito divergence; NMF methodology; blind separation; convolutive mixtures; decomposition algorithm; expectation-maximization algorithm; general data-driven object-based model; linear instantaneous mixing; multichannel audio data; multichannel data; multichannel nonnegative matrix factorization; multiplicative update algorithm; music sources; signal separation evaluation campaign; source signals; speech sources; statistical model; stereo audio source separation; superimposed Gaussian component; supervised separation; synthetic instantaneous mixtures; underdetermined convolutive mixture; Expectation-maximization (EM) algorithm; multichannel audio; nonnegative matrix factorization (NMF); nonnegative tensor factorization (NTF); underdetermined convolutive blind source separation (BSS);
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2009.2031510
Filename :
5229304
Link To Document :
بازگشت