Title :
Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria
Author :
Virtanen, Tuomas
Author_Institution :
Tampere Univ. of Technol.
fDate :
3/1/2007 12:00:00 AM
Abstract :
An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a time-varying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements
Keywords :
acoustic signal processing; iterative methods; matrix decomposition; signal reconstruction; source separation; unsupervised learning; drum sounds; independent subspace analysis; iterative estimation algorithm; magnitude spectrogram factorization; monaural sound source separation; multiplicative update rules; nonnegative matrix factorization; one-channel music signals; pitched musical instrument samples; reconstruction error; sparseness criteria; temporal continuity criterion; time-varying gain; unsupervised learning algorithm; Costs; Humans; Independent component analysis; Machine learning algorithms; Multiple signal classification; Music; Source separation; Sparse matrices; Spectrogram; Unsupervised learning; Acoustic signal analysis; audio source separation; blind source separation; music; nonnegative matrix factorization; sparse coding; unsupervised learning;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2006.885253