مرکز منطقه ای اطلاع رساني علوم و فناوري - Improved music feature learning with deep neural networks

DocumentCode :

180126

Title :

Improved music feature learning with deep neural networks

Author :

Sigtia, Siddharth ; Dixon, Sam

Author_Institution :

Centre for Digital Music, Queen Mary Univ. of London, London, UK

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

6959

Lastpage :

6963

Abstract :

Recent advances in neural network training provide a way to efficiently learn representations from raw data. Good representations are an important requirement for Music Information Retrieval (MIR) tasks to be performed successfully. However, a major problem with neural networks is that training time becomes prohibitive for very large datasets and the learning algorithm can get stuck in local minima for very deep and wide network architectures. In this paper we examine 3 ways to improve feature learning for audio data using neural networks: 1.using Rectified Linear Units (ReLUs) instead of standard sigmoid units; 2.using a powerful regularisation technique called Dropout; 3.using Hessian-Free (HF) optimisation to improve training of sigmoid nets. We show that these methods provide significant improvements in training time and the features learnt are better than state of the art handcrafted features, with a genre classification accuracy of 83 ± 1.1% on the Tzanetakis (GTZAN) dataset. We found that the rectifier networks learnt better features than the sigmoid networks. We also demonstrate the capacity of the features to capture relevant information from audio data by applying them to genre classification on the ISMIR 2004 dataset.

Keywords :

information retrieval; learning (artificial intelligence); music; neural nets; Dropout; GTZAN dataset; HF optimisation; Hessian-free optimisation; MIR tasks; ReLU; Tzanetakis dataset; audio data; deep neural networks; feature learning; genre classification; music feature learning; music information retrieval; neural network training; rectified linear units; rectifier networks; regularisation technique; sigmoid nets; training time; Accuracy; Feature extraction; Hafnium; Neural networks; Optimization; Training; Training data; Deep Learning; MIR; Neural Networks;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6854949

Filename :

6854949

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=180126