• DocumentCode
    52138
  • Title

    Multi-pitch Streaming of Harmonic Sound Mixtures

  • Author

    Zhiyao Duan ; Jinyu Han ; Pardo, Bryan

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Rochester, Rochester, NY, USA
  • Volume
    22
  • Issue
    1
  • fYear
    2014
  • fDate
    Jan. 2014
  • Firstpage
    138
  • Lastpage
    150
  • Abstract
    Multi-pitch analysis of concurrent sound sources is an important but challenging problem. It requires estimating pitch values of all harmonic sources in individual frames and streaming the pitch estimates into trajectories, each of which corresponds to a source. We address the streaming problem for monophonic sound sources. We take the original audio, plus frame-level pitch estimates from any multi-pitch estimation algorithm as inputs, and output a pitch trajectory for each source. Our approach does not require pre-training of source models from isolated recordings. Instead, it casts the problem as a constrained clustering problem, where each cluster corresponds to a source. The clustering objective is to minimize the timbre inconsistency within each cluster. We explore different timbre features for music and speech. For music, harmonic structure and a newly proposed feature called uniform discrete cepstrum (UDC) are found effective; while for speech, MFCC and UDC works well. We also show that timbre-consistency is insufficient for effective streaming. Constraints are imposed on pairs of pitch estimates according to their time-frequency relationships. We propose a new constrained clustering algorithm that satisfies as many constraints as possible while optimizing the clustering objective. We compare the proposed approach with other state-of-the-art supervised and unsupervised multi-pitch streaming approaches that are specifically designed for music or speech. Better or comparable results are shown.
  • Keywords
    pattern clustering; speech processing; time-frequency analysis; MFCC; UDC; cochannel speech; concurrent sound sources; constrained clustering algorithm; frame-level pitch estimation; harmonic sound mixtures; harmonic structure; monophonic sound sources; multipitch estimation algorithm; multipitch streaming analysis; pitch trajectory; speech; supervised multipitch streaming approach; timbre-consistency; time-frequency relationships; uniform discrete cepstrum; unsupervised multipitch streaming approach; Clustering algorithms; Instruments; Speech; Speech processing; Timbre; Trajectory; Cochannel speech; constrained clustering; multi-pitch analysis; pitch streaming; timbre tracking;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2013.2285484
  • Filename
    6633082