• DocumentCode
    56725
  • Title

    SiPTH: Singing Transcription Based on Hysteresis Defined on the Pitch-Time Curve

  • Author

    Molina, Emilio ; Tardon, Lorenzo J. ; Barbancho, Ana M. ; Barbancho, Isabel

  • Author_Institution
    ATIC Res. Group, Univ. de Malaga, Malaga, Spain
  • Volume
    23
  • Issue
    2
  • fYear
    2015
  • fDate
    Feb. 2015
  • Firstpage
    252
  • Lastpage
    263
  • Abstract
    In this paper, we present a method for monophonic singing transcription based on hysteresis defined on the pitch-time curve. This method is designed to perform note segmentation even when the pitch evolution during the same note behaves unstably, as in the case of untrained singers. The selected approach estimates the regions in which the chroma is stable, these regions are classified as voiced or unvoiced according to a decision tree classifier using two descriptors based on aperiodicity and power. Then, a note segmentation stage based on pitch intervals of the sung signal is carried out. To this end, a dynamic averaging of the pitch curve is performed after the beginning of a note is detected in order to roughly estimate the pitch. Deviations of the actual pitch curve with respect to this average are measured to determine the next note change according to a hysteresis process defined on the pitch-time curve. Finally, each note is labeled using three single values: rounded pitch (to semitones), duration and volume. Also, a complete evaluation methodology that includes the definition of different relevant types of errors, measures and a method for the computation of the evaluation measures are presented. The proposed system improves significantly the performance of the baseline approach, and attains results similar to previous approaches.
  • Keywords
    acoustic signal processing; decision trees; hysteresis; SiPTH; decision tree classifier; hysteresis process; monophonic singing transcription; note segmentation; pitch curve; pitch evolution; pitch intervals; pitch-time curve; Decision trees; Feature extraction; Hysteresis; Indexes; Labeling; Speech; Speech processing; Acoustic signal processing; fundamental frequency; pitch; singing transcription; singing voice analysis;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2014.2331102
  • Filename
    6837431