DocumentCode :
148467
Title :
Speech-music discrimination: A deep learning perspective
Author :
Pikrakis, Aggelos ; Theodoridis, S.
Author_Institution :
Dept. of Inf., Univ. of Piraeus, Piraeus, Greece
fYear :
2014
fDate :
1-5 Sept. 2014
Firstpage :
616
Lastpage :
620
Abstract :
This paper is a study of the problem of speech-music discrimination from a deep learning perspective. We experiment with two feature extraction schemes and investigate how network depth and RBM size affect the classification performance on publicly available datasets and on large amounts of audio data from video-sharing sites, without placing restrictions on the recording conditions. The main building block of our deep networks is the Restricted Boltzmann Machine (RBM) with binary, stochastic units. The stack of RBMs is pre-trained in a layer-wise mode and, subsequently, a fine-tuning stage trains the deep network as a whole with back-propagation. The proposed approach indicates that deep architectures can serve as strong classifiers for the broad binary problem of speech vs music, with satisfactory generalization performance.
Keywords :
Boltzmann machines; audio signal processing; backpropagation; feature extraction; speech processing; RBM; audio data; backpropagation; deep learning perspective; feature extraction; restricted Boltzmann machine; speech-music discrimination; Associative memory; Computer architecture; Feature extraction; Speech; Testing; Training; Vectors; Deep learning; Speech-Music Discrimination;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European
Conference_Location :
Lisbon
Type :
conf
Filename :
6952182
Link To Document :
بازگشت