• DocumentCode
    180126
  • Title

    Improved music feature learning with deep neural networks

  • Author

    Sigtia, Siddharth ; Dixon, Sam

  • Author_Institution
    Centre for Digital Music, Queen Mary Univ. of London, London, UK
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    6959
  • Lastpage
    6963
  • Abstract
    Recent advances in neural network training provide a way to efficiently learn representations from raw data. Good representations are an important requirement for Music Information Retrieval (MIR) tasks to be performed successfully. However, a major problem with neural networks is that training time becomes prohibitive for very large datasets and the learning algorithm can get stuck in local minima for very deep and wide network architectures. In this paper we examine 3 ways to improve feature learning for audio data using neural networks: 1.using Rectified Linear Units (ReLUs) instead of standard sigmoid units; 2.using a powerful regularisation technique called Dropout; 3.using Hessian-Free (HF) optimisation to improve training of sigmoid nets. We show that these methods provide significant improvements in training time and the features learnt are better than state of the art handcrafted features, with a genre classification accuracy of 83 ± 1.1% on the Tzanetakis (GTZAN) dataset. We found that the rectifier networks learnt better features than the sigmoid networks. We also demonstrate the capacity of the features to capture relevant information from audio data by applying them to genre classification on the ISMIR 2004 dataset.
  • Keywords
    information retrieval; learning (artificial intelligence); music; neural nets; Dropout; GTZAN dataset; HF optimisation; Hessian-free optimisation; MIR tasks; ReLU; Tzanetakis dataset; audio data; deep neural networks; feature learning; genre classification; music feature learning; music information retrieval; neural network training; rectified linear units; rectifier networks; regularisation technique; sigmoid nets; training time; Accuracy; Feature extraction; Hafnium; Neural networks; Optimization; Training; Training data; Deep Learning; MIR; Neural Networks;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854949
  • Filename
    6854949