DocumentCode :
1498530
Title :
A Conditional Random Field Framework for Robust and Scalable Audio-to-Score Matching
Author :
Joder, Cyril ; Essid, Slim ; Richard, Gaël
Author_Institution :
Inst. TELECOM, TELECOM ParisTech, Paris, France
Volume :
19
Issue :
8
fYear :
2011
Firstpage :
2385
Lastpage :
2397
Abstract :
In this paper, we introduce the use of conditional random fields (CRFs) for the audio-to-score alignment task. This framework encompasses the statistical models which are used in the literature and allows for more flexible dependency structures. In particular, it allows observation functions to be computed from several analysis frames. Three different CRF models are proposed for our task, for different choices of tradeoff between accuracy and complexity. Three types of features are used, characterizing the local harmony, note attacks and tempo. We also propose a novel hierarchical approach, which takes advantage of the score structure for an approximate decoding of the statistical model. This strategy reduces the complexity, yielding a better overall efficiency than the classic beam search method used in HMM-based models. Experiments run on a large database of classical piano and popular music exhibit very accurate alignments. Indeed, with the best performing system, more than 95% of the note onsets are detected with a precision finer than 100 ms. We additionally show how the proposed framework can be modified in order to be robust to possible structural differences between the score and the musical performance.
Keywords :
audio signal processing; decoding; hidden Markov models; music; HMM-based models; approximate decoding; beam search method; classical piano; conditional random field framework; flexible dependency structures; popular music exhibit; robust audio-to-score matching; scalable audio-to-score matching; statistical models; Complexity theory; Concurrent computing; Context; Decoding; Hidden Markov models; Real time systems; Robustness; Audio signal processing; Viterbi algorithm; conditional random fields (CRFs); machine learning; music; music-to-score alignment;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2011.2134092
Filename :
5752828
Link To Document :
بازگشت