Singing voice analysis and editing based on mutually dependent F0 estimation and source separation

Author

Ikemiya, Yukara ; Yoshii, Kazuyoshi ; Itoyama, Katsutoshi

Author_Institution

Grad. Sch. of Inf., Kyoto Univ., Kyoto, Japan

fYear

2015

fDate

19-24 April 2015

Firstpage

574

Lastpage

578

Abstract

This paper presents a novel framework that improves both vocal fundamental frequency (F0) estimation and singing voice separation by making effective use of the mutual dependency of those two tasks. A typical approach to singing voice separation is to estimate the vocal F0 contour from a target music signal and then extract the singing voice by using a time-frequency mask that passes only the harmonic components of the vocal F0s and overtones. Vocal F0 estimation, on the contrary, is considered to become easier if only the singing voice can be extracted accurately from the target signal. Such mutual dependency has scarcely been focused on in most conventional studies. To overcome this limitation, our framework alternates those two tasks while using the results of each in the other. More specifically, we first extract the singing voice by using robust principal component analysis (RPCA). The F0 contour is then estimated from the separated singing voice by finding the optimal path over a F0-saliency spectrogram based on subharmonic summation (SHS). This enables us to improve singing voice separation by combining a time-frequency mask based on RPCA with a mask based on harmonic structures. Experimental results obtained when we used the proposed technique to directly edit vocal F0s in popular-music audio signals showed that it significantly improved both vocal F0 estimation and singing voice separation.

Keywords

acoustic signal processing; principal component analysis; source separation; speech processing; F0 saliency spectrogram; RPCA; SHS; harmonic components; harmonic structure; mutually dependent F0 estimation; overtones; popular-music audio signals; robust principal component analysis; separated singing voice; singing voice analysis; singing voice separation; source separation; subharmonic summation; target music signal; target signal; time-frequency mask; vocal fundamental frequency estimation; Harmonic analysis; Indexes; Silicon carbide; Viterbi algorithm; Vocal F0 estimation; melody extraction; robust principal component analysis (RPCA); singing voice separation; subharmonic summation (SHS);

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location

South Brisbane, QLD

Type

conf

DOI

10.1109/ICASSP.2015.7178034

Filename

7178034