Abstract :
Computational modeling of musical timbre is important for a variety of music information retrieval applications. While considerable progress has been made to recognize musical genres and instruments, relatively little attention has been paid to modeling playing techniques, which affect timbre in more subtle ways. In this paper, we contribute to this area of research by systematically evaluating various audio features and processing methods for multi-class playing technique classification, considering up to nine distinct playing techniques of bowed string instruments. Specifically, a collection of 6,759 chamber-recorded single notes of four bowed string instruments and a collection of 33 real-world solo violin recordings are used in the evaluation. Our evaluation shows that using sparse features extracted from the magnitude spectra and phase derivatives including group delay function (GDF) and instantaneous frequency deviation (IFD) leads to significantly better performance than using a combination of state-of-the-art temporal, spectral, cepstral and harmonic feature descriptors. For playing technique classification of violin singe notes, the former approach attains 0.915 macro-average F-score under a tenfold cross validation setting, while the latter only attains 0.835. Moreover, sparse modeling of magnitude and phase-derived spectra also performs well for single-note joint instrument-technique classification (F-score 0.770) and for playing technique classification of real-world violin solos (F-score 0.547). We find that phase information is particularly important in discriminating playing techniques with subtle differences, such as playing with different bowing positions (i.e., normal, sul tasto, and sul ponticello). A systematic investigation of the effect of parameters such as window sizes, hop factors, window types for phase-derived features is also reported to provide more insights.
Keywords :
information retrieval; music; musical instruments; GDF; IFD; audio features methods; audio processing methods; bowed string instruments; computational modeling; group delay function; hop factors; instantaneous frequency deviation; magnitude derived spectra; magnitude spectra; multiclass playing technique classification; music information retrieval applications; musical genres; musical instruments; musical timbre; phase derivatives; phase derived features; phase derived spectra; phase information; playing technique classification; real-world violin solos; sparse feature extraction; sparse modeling; violin singe notes; window sizes; window types; Encoding; Feature extraction; Harmonic analysis; Instruments; Timbre; Time-frequency analysis; Group delay function; instantaneous frequency deviation; phase; playing technique classification; sparse coding;
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on