• DocumentCode
    73668
  • Title

    An Acoustic-Phonetic Model of F0 Likelihood for Vocal Melody Extraction

  • Author

    Yu-Ren Chien ; Hsin-Min Wang ; Shyh-Kang Jeng

  • Author_Institution
    Grad. Inst. of Commun. Eng., Nat. Taiwan Univ., Taipei, Taiwan
  • Volume
    23
  • Issue
    9
  • fYear
    2015
  • fDate
    Sept. 2015
  • Firstpage
    1457
  • Lastpage
    1468
  • Abstract
    This paper presents a novel approach to extraction of vocal melodies from accompanied singing recordings. Central to our approach is a model of vocal fundamental frequency (F0) likelihood that integrates acoustic-phonetic knowledge and real-world data. This model consists of a timbral fitness score and a loudness measure of each F0 candidate. Timbral fitness is measured for the partial amplitudes of an F0 candidate, with respect to a small set of vocal timbre examples. This F0-specific measurement of timbral fitness depends on an acoustic-phonetic F0 modification of each timbre example. In the loudness part of the likelihood model, sinusoids are detected, tracked, and pruned to give loudness values that minimize interference from the accompaniment. A final F0 estimate is determined by a prior model of F0 sequence in addition to the likelihood model. Melody extraction is completed by detecting voiced time positions according to the singing voice loudness variations given by the estimated F0 sequence. The numerical parameters involved in our approach were optimized on three development sets from different sources before the system was evaluated on ten test sets separate from these development sets. Controlled experiments show that use of the timbral fitness score accounts for a 13% difference in overall accuracy.
  • Keywords
    acoustic signal processing; F0 likelihood; F0 sequence; acoustic-phonetic F0 modification; loudness measure; melody extraction; singing voice loudness variations; timbral fitness score; vocal fundamental frequency likelihood; vocal melody extraction; Frequency estimation; Instruments; Numerical models; Shape; Speech; Timbre; Time-frequency analysis; Acoustic phonetics; F0 estimation; F0 likelihood; F0 modification; melody extraction; singing voice; vocal timbre examples;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2015.2436345
  • Filename
    7111258