Multi-Stage Non-Negative Matrix Factorization for Monaural Singing Voice Separation

Author

Bilei Zhu ; Wei Li ; Ruijiang Li ; Xiangyang Xue

Author_Institution

Sch. of Comput. Sci., Fudan Univ., Shanghai, China

Volume

21

Issue

10

fYear

2013

fDate

Oct. 2013

Firstpage

2096

Lastpage

2107

Abstract

Separating singing voice from music accompaniment can be of interest for many applications such as melody extraction, singer identification, lyrics alignment and recognition, and content-based music retrieval. In this paper, a novel algorithm for singing voice separation in monaural mixtures is proposed. The algorithm consists of two stages, where non-negative matrix factorization (NMF) is applied to decompose the mixture spectrograms with long and short windows respectively. A spectral discontinuity thresholding method is devised for the long-window NMF to select out NMF components originating from pitched instrumental sounds, and a temporal discontinuity thresholding method is designed for the short-window NMF to pick out NMF components that are from percussive sounds. By eliminating the selected components, most pitched and percussive elements of the music accompaniment are filtered out from the input sound mixture, with little effect on the singing voice. Extensive testing on the MIR-1K public dataset of 1000 short audio clips and the Beach-Boys dataset of 14 full-track real-world songs showed that the proposed algorithm is both effective and efficient.

Keywords

filtering theory; matrix decomposition; speech synthesis; Beach-Boys dataset; MIR-1K public dataset; NMF components; audio clips; content-based music retrieval; full-track real-world songs; input sound mixture; long-window NMF; lyrics alignment; melody extraction; mixture spectrograms; monaural mixtures; monaural singing voice separation; multistage nonnegative matrix factorization; music accompaniment; nonnegative matrix factorization; short-window NMF; singer identification; spectral discontinuity thresholding method; temporal discontinuity thresholding method; voice separation; Multi-stage method; non-negative matrix factorization (NMF); singing voice separation; spectral discontinuity; temporal discontinuity;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2013.2266773

Filename

6525353