• DocumentCode
    730130
  • Title

    Singing voice analysis and editing based on mutually dependent F0 estimation and source separation

  • Author

    Ikemiya, Yukara ; Yoshii, Kazuyoshi ; Itoyama, Katsutoshi

  • Author_Institution
    Grad. Sch. of Inf., Kyoto Univ., Kyoto, Japan
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    574
  • Lastpage
    578
  • Abstract
    This paper presents a novel framework that improves both vocal fundamental frequency (F0) estimation and singing voice separation by making effective use of the mutual dependency of those two tasks. A typical approach to singing voice separation is to estimate the vocal F0 contour from a target music signal and then extract the singing voice by using a time-frequency mask that passes only the harmonic components of the vocal F0s and overtones. Vocal F0 estimation, on the contrary, is considered to become easier if only the singing voice can be extracted accurately from the target signal. Such mutual dependency has scarcely been focused on in most conventional studies. To overcome this limitation, our framework alternates those two tasks while using the results of each in the other. More specifically, we first extract the singing voice by using robust principal component analysis (RPCA). The F0 contour is then estimated from the separated singing voice by finding the optimal path over a F0-saliency spectrogram based on subharmonic summation (SHS). This enables us to improve singing voice separation by combining a time-frequency mask based on RPCA with a mask based on harmonic structures. Experimental results obtained when we used the proposed technique to directly edit vocal F0s in popular-music audio signals showed that it significantly improved both vocal F0 estimation and singing voice separation.
  • Keywords
    acoustic signal processing; principal component analysis; source separation; speech processing; F0 saliency spectrogram; RPCA; SHS; harmonic components; harmonic structure; mutually dependent F0 estimation; overtones; popular-music audio signals; robust principal component analysis; separated singing voice; singing voice analysis; singing voice separation; source separation; subharmonic summation; target music signal; target signal; time-frequency mask; vocal fundamental frequency estimation; Harmonic analysis; Indexes; Silicon carbide; Viterbi algorithm; Vocal F0 estimation; melody extraction; robust principal component analysis (RPCA); singing voice separation; subharmonic summation (SHS);
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178034
  • Filename
    7178034