Title :
Speech-music discrimination: A deep learning perspective
Author :
Pikrakis, Aggelos ; Theodoridis, S.
Author_Institution :
Dept. of Inf., Univ. of Piraeus, Piraeus, Greece
Abstract :
This paper is a study of the problem of speech-music discrimination from a deep learning perspective. We experiment with two feature extraction schemes and investigate how network depth and RBM size affect the classification performance on publicly available datasets and on large amounts of audio data from video-sharing sites, without placing restrictions on the recording conditions. The main building block of our deep networks is the Restricted Boltzmann Machine (RBM) with binary, stochastic units. The stack of RBMs is pre-trained in a layer-wise mode and, subsequently, a fine-tuning stage trains the deep network as a whole with back-propagation. The proposed approach indicates that deep architectures can serve as strong classifiers for the broad binary problem of speech vs music, with satisfactory generalization performance.
Keywords :
Boltzmann machines; audio signal processing; backpropagation; feature extraction; speech processing; RBM; audio data; backpropagation; deep learning perspective; feature extraction; restricted Boltzmann machine; speech-music discrimination; Associative memory; Computer architecture; Feature extraction; Speech; Testing; Training; Vectors; Deep learning; Speech-Music Discrimination;
Conference_Titel :
Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European
Conference_Location :
Lisbon