Improved music feature learning with deep neural networks

Author

Sigtia, Siddharth ; Dixon, Sam

Author_Institution

Centre for Digital Music, Queen Mary Univ. of London, London, UK

fYear

2014

fDate

4-9 May 2014

Firstpage

6959

Lastpage

6963

Abstract

Recent advances in neural network training provide a way to efficiently learn representations from raw data. Good representations are an important requirement for Music Information Retrieval (MIR) tasks to be performed successfully. However, a major problem with neural networks is that training time becomes prohibitive for very large datasets and the learning algorithm can get stuck in local minima for very deep and wide network architectures. In this paper we examine 3 ways to improve feature learning for audio data using neural networks: 1.using Rectified Linear Units (ReLUs) instead of standard sigmoid units; 2.using a powerful regularisation technique called Dropout; 3.using Hessian-Free (HF) optimisation to improve training of sigmoid nets. We show that these methods provide significant improvements in training time and the features learnt are better than state of the art handcrafted features, with a genre classification accuracy of 83 ± 1.1% on the Tzanetakis (GTZAN) dataset. We found that the rectifier networks learnt better features than the sigmoid networks. We also demonstrate the capacity of the features to capture relevant information from audio data by applying them to genre classification on the ISMIR 2004 dataset.

Keywords

information retrieval; learning (artificial intelligence); music; neural nets; Dropout; GTZAN dataset; HF optimisation; Hessian-free optimisation; MIR tasks; ReLU; Tzanetakis dataset; audio data; deep neural networks; feature learning; genre classification; music feature learning; music information retrieval; neural network training; rectified linear units; rectifier networks; regularisation technique; sigmoid nets; training time; Accuracy; Feature extraction; Hafnium; Neural networks; Optimization; Training; Training data; Deep Learning; MIR; Neural Networks;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6854949

Filename

6854949