Improving music auto-tagging by intra-song instance bagging

Author

Yeh, Chin-Chia Michael ; Ju-Chiang Wang ; Yi-Hsuan Yang ; Hsin-Min Wang

Author_Institution

Res. Center for Inf. Technol. Innovation, Acad. Sinica, Taipei, Taiwan

fYear

2014

fDate

4-9 May 2014

Firstpage

2139

Lastpage

2143

Abstract

Bagging is one the most classic ensemble learning techniques in the machine learning literature. The idea is to generate multiple subsets of the training data via bootstrapping (random sampling with replacement), and then aggregate the output of the models trained from each subset via voting or averaging. As music is a temporal signal, we propose and study two bagging methods in this paper: the inter-song instance bagging that bootstraps song-level features, and the intra-song instance bagging that draws bootstrapping samples directly from short-time features for each training song. In particular, we focus on the latter method, as it better exploits the temporal information of music signals. The bagging methods result in surprisingly effective models for music auto-tagging: incorporating the idea to a simple linear support vector machine (SVM) based system yields accuracies that are comparable or even superior to state-of-the-art, possibly more sophisticated methods for three different datasets. As the bagging method is a meta algorithm, it holds the promise of improving other MIR systems.

Keywords

audio signal processing; bootstrapping; learning (artificial intelligence); musical acoustics; signal classification; signal sampling; support vector machines; MIR systems; SVM; bootstrapping; ensemble learning techniques; intra-song instance bagging method; linear support vector machine; machine learning literature; meta algorithm; music auto-tagging; music signals; random sampling; short-time features; song-level features; temporal information; temporal signal; training data; training song; Bagging; Dictionaries; Feature extraction; Music; Support vector machines; Training; Vectors; Bagging; ensemble classification; feature pooling; music auto-tagging; sparse coding;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6853977

Filename

6853977