مرکز منطقه ای اطلاع رساني علوم و فناوري - A Conditional Random Field Framework for Robust and Scalable Audio-to-Score Matching

DocumentCode :

1498530

Title :

A Conditional Random Field Framework for Robust and Scalable Audio-to-Score Matching

Author :

Joder, Cyril ; Essid, Slim ; Richard, Gaël

Author_Institution :

Inst. TELECOM, TELECOM ParisTech, Paris, France

Volume :

Issue :

fYear :

2011

Firstpage :

2385

Lastpage :

2397

Abstract :

In this paper, we introduce the use of conditional random fields (CRFs) for the audio-to-score alignment task. This framework encompasses the statistical models which are used in the literature and allows for more flexible dependency structures. In particular, it allows observation functions to be computed from several analysis frames. Three different CRF models are proposed for our task, for different choices of tradeoff between accuracy and complexity. Three types of features are used, characterizing the local harmony, note attacks and tempo. We also propose a novel hierarchical approach, which takes advantage of the score structure for an approximate decoding of the statistical model. This strategy reduces the complexity, yielding a better overall efficiency than the classic beam search method used in HMM-based models. Experiments run on a large database of classical piano and popular music exhibit very accurate alignments. Indeed, with the best performing system, more than 95% of the note onsets are detected with a precision finer than 100 ms. We additionally show how the proposed framework can be modified in order to be robust to possible structural differences between the score and the musical performance.

Keywords :

audio signal processing; decoding; hidden Markov models; music; HMM-based models; approximate decoding; beam search method; classical piano; conditional random field framework; flexible dependency structures; popular music exhibit; robust audio-to-score matching; scalable audio-to-score matching; statistical models; Complexity theory; Concurrent computing; Context; Decoding; Hidden Markov models; Real time systems; Robustness; Audio signal processing; Viterbi algorithm; conditional random fields (CRFs); machine learning; music; music-to-score alignment;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2011.2134092

Filename :

5752828

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1498530