DocumentCode :
112798
Title :
Multiple F0 Estimation and Source Clustering of Polyphonic Music Audio Using PLCA and HMRFs
Author :
Arora, Vipul ; Behera, Laxmidhar
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Technol., Kanpur, Kanpur, India
Volume :
23
Issue :
2
fYear :
2015
fDate :
Feb. 2015
Firstpage :
278
Lastpage :
287
Abstract :
Source transcription of pitched polyphonic music entails providing the pitch (F0) values corresponding to each source in a separate channel. This problem is an important step towards many important problems in music and speech processing. It involves 1) estimating the multiple F0 values in each short time frame, and 2) clustering the F0 values into streams corresponding to different sources. We address the problem in an unsupervised way, with only the total number of sources given beforehand. The framework of probabilistic latent component analysis (PLCA) is used to decompose the polyphonic short-time magnitude spectra for multiple F0 estimation and source-specific feature extraction. It is further embedded into the structure of hidden Markov random fields (HMRF) for clustering the F0s into different sources. This clustering is constrained by the cognitive grouping of continuous F0 contours as well as segregation of simultaneous F0s into different source streams. Such constraints are effectively and elegantly modeled by the HMRF´s. Simulated annealing varies the degree of constraints for better clustering. The paper also proposes a novel strategy using the trade-off between precision and recall of multiple F0 estimation for better clustering. Evaluations over a variety of datasets show the efficacy of the proposed algorithm and its robustness to the presence of spurious F0s while clustering. It also outperforms a state-of-the-art unsupervised source streaming algorithm in a set of comparative experiments.
Keywords :
audio signal processing; hidden Markov models; pattern clustering; probability; F0 value clustering; HMRF; PLCA; cognitive grouping; hidden Markov random fields; multiple F0 estimation; polyphonic music audio; polyphonic short-time magnitude spectra; probabilistic latent component analysis; source clustering; source transcription; source-specific feature extraction; speech processing; Estimation; Harmonic analysis; Hidden Markov models; IEEE transactions; Speech; Speech processing; Time-frequency analysis; Acoustic scene analysis; automatic music transcription; hidden Markov random fields; multiple F0 estimation; polyphonic instrument identification;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
2329-9290
Type :
jour
DOI :
10.1109/TASLP.2014.2387388
Filename :
7001182
Link To Document :
بازگشت