Title :
Singing voice analysis and editing based on mutually dependent F0 estimation and source separation
Author :
Ikemiya, Yukara ; Yoshii, Kazuyoshi ; Itoyama, Katsutoshi
Author_Institution :
Grad. Sch. of Inf., Kyoto Univ., Kyoto, Japan
Abstract :
This paper presents a novel framework that improves both vocal fundamental frequency (F0) estimation and singing voice separation by making effective use of the mutual dependency of those two tasks. A typical approach to singing voice separation is to estimate the vocal F0 contour from a target music signal and then extract the singing voice by using a time-frequency mask that passes only the harmonic components of the vocal F0s and overtones. Vocal F0 estimation, on the contrary, is considered to become easier if only the singing voice can be extracted accurately from the target signal. Such mutual dependency has scarcely been focused on in most conventional studies. To overcome this limitation, our framework alternates those two tasks while using the results of each in the other. More specifically, we first extract the singing voice by using robust principal component analysis (RPCA). The F0 contour is then estimated from the separated singing voice by finding the optimal path over a F0-saliency spectrogram based on subharmonic summation (SHS). This enables us to improve singing voice separation by combining a time-frequency mask based on RPCA with a mask based on harmonic structures. Experimental results obtained when we used the proposed technique to directly edit vocal F0s in popular-music audio signals showed that it significantly improved both vocal F0 estimation and singing voice separation.
Keywords :
acoustic signal processing; principal component analysis; source separation; speech processing; F0 saliency spectrogram; RPCA; SHS; harmonic components; harmonic structure; mutually dependent F0 estimation; overtones; popular-music audio signals; robust principal component analysis; separated singing voice; singing voice analysis; singing voice separation; source separation; subharmonic summation; target music signal; target signal; time-frequency mask; vocal fundamental frequency estimation; Harmonic analysis; Indexes; Silicon carbide; Viterbi algorithm; Vocal F0 estimation; melody extraction; robust principal component analysis (RPCA); singing voice separation; subharmonic summation (SHS);
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178034