مرکز منطقه ای اطلاع رساني علوم و فناوري - Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation

DocumentCode :

1361508

Title :

Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation

Author :

Ozerov, Alexey ; Févotte, Cédric

Author_Institution :

CNRS LTCI, Telecom ParisTech, Paris, France

Volume :

Issue :

fYear :

2010

fDate :

3/1/2010 12:00:00 AM

Firstpage :

550

Lastpage :

563

Abstract :

We consider inference in a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. We work in the short-time Fourier transform (STFT) domain, where convolution is routinely approximated as linear instantaneous mixing in each frequency band. Each source STFT is given a model inspired from nonnegative matrix factorization (NMF) with the Itakura-Saito divergence, which underlies a statistical model of superimposed Gaussian components. We address estimation of the mixing and source parameters using two methods. The first one consists of maximizing the exact joint likelihood of the multichannel data using an expectation-maximization (EM) algorithm. The second method consists of maximizing the sum of individual likelihoods of all channels using a multiplicative update algorithm inspired from NMF methodology. Our decomposition algorithms are applied to stereo audio source separation in various settings, covering blind and supervised separation, music and speech sources, synthetic instantaneous and convolutive mixtures, as well as professionally produced music recordings. Our EM method produces competitive results with respect to state-of-the-art as illustrated on two tasks from the international Signal Separation Evaluation Campaign (SiSEC 2008).

Keywords :

Fourier transforms; audio signal processing; blind source separation; convolution; matrix decomposition; speech processing; statistical analysis; Fourier transform; Itakura-Saito divergence; NMF methodology; blind separation; convolutive mixtures; decomposition algorithm; expectation-maximization algorithm; general data-driven object-based model; linear instantaneous mixing; multichannel audio data; multichannel data; multichannel nonnegative matrix factorization; multiplicative update algorithm; music sources; signal separation evaluation campaign; source signals; speech sources; statistical model; stereo audio source separation; superimposed Gaussian component; supervised separation; synthetic instantaneous mixtures; underdetermined convolutive mixture; Expectation-maximization (EM) algorithm; multichannel audio; nonnegative matrix factorization (NMF); nonnegative tensor factorization (NTF); underdetermined convolutive blind source separation (BSS);

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2009.2031510

Filename :

5229304

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1361508