Joint audio source separation and dereverberation based on multichannel factorial hidden Markov model

Author

Higuchi, Tatsuro ; Kameoka, Hirokazu

Author_Institution

Grad. Sch. of Inf. Sci. & Technol., Univ. of Tokyo, Tokyo, Japan

fYear

2014

fDate

21-24 Sept. 2014

Firstpage

Lastpage

Abstract

This paper proposes a unified approach for jointly solving underdetermined source separation, audio event detection and dereverberation of convolutive mixtures. For monaural source separation, one successful approach involves applying non-negative matrix factorization (NMF) to the magnitude spectrogram of a mixture signal, interpreted as a non-negative matrix. Several attempts have recently been made to extend this approach to a multichannel case in order to utilize the spatial correlation of the multichannel inputs as an additional clue for source separation. The multichannel NMF assumes that an observed signal is a mixture of a limited number of source signals each of which has a static power spectral density scaled by a time-varying amplitude. We have previously proposed an extension of this approach, in which the variations over time of the spectral density and the total power of each source is modeled by a hidden Markov model (HMM). This has allowed us to solve source activity detection and source separation simultaneously through model parameter inference. While this method was based on an anechoic mixing model, the aim of this paper is to further extend the above approach to deal with reverberation by incorporating an echoic mixing model into the generative model of observed signals. Through an experiment of underdetermined source separation under reverberant conditions, we confirmed that the proposed method provided a 9.61 dB improvement compared with the conventional method in terms of the signal-to-interference ratio.

Keywords

acoustic convolution; audio signal processing; blind source separation; echo; hidden Markov models; matrix decomposition; reverberation; signal detection; spectral analysis; time-varying channels; NMF; anechoic mixing model; audio event detection; audio source dereverberation; audio source separation; convolutive mixture dereverberation; echoic mixing model; mixture signal spectrogram; multichannel factorial hidden markov model; multichannel spatial correlation utilization; nonnegative matrix factorization; observed signal generative model; reverberation; signal-to-interference ratio; source activity detection; spectral density time variation; static power spectral density; time-varying amplitude; Arrays; Hidden Markov models; Microphones; Optimization; Source separation; Spectrogram; Time-frequency analysis; audio event detection; dereverberation; multichannel factorial hidden Markov model; non-negative matrix factorization; source separation;

fLanguage

English

Publisher

ieee

Conference_Titel

Machine Learning for Signal Processing (MLSP), 2014 IEEE International Workshop on

Conference_Location

Reims

Type

conf

DOI

10.1109/MLSP.2014.6958927

Filename

6958927

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=155685