مرکز منطقه ای اطلاع رساني علوم و فناوري - Speech-music discrimination: A deep learning perspective

DocumentCode :

148467

Title :

Speech-music discrimination: A deep learning perspective

Author :

Pikrakis, Aggelos ; Theodoridis, S.

Author_Institution :

Dept. of Inf., Univ. of Piraeus, Piraeus, Greece

fYear :

2014

fDate :

1-5 Sept. 2014

Firstpage :

616

Lastpage :

620

Abstract :

This paper is a study of the problem of speech-music discrimination from a deep learning perspective. We experiment with two feature extraction schemes and investigate how network depth and RBM size affect the classification performance on publicly available datasets and on large amounts of audio data from video-sharing sites, without placing restrictions on the recording conditions. The main building block of our deep networks is the Restricted Boltzmann Machine (RBM) with binary, stochastic units. The stack of RBMs is pre-trained in a layer-wise mode and, subsequently, a fine-tuning stage trains the deep network as a whole with back-propagation. The proposed approach indicates that deep architectures can serve as strong classifiers for the broad binary problem of speech vs music, with satisfactory generalization performance.

Keywords :

Boltzmann machines; audio signal processing; backpropagation; feature extraction; speech processing; RBM; audio data; backpropagation; deep learning perspective; feature extraction; restricted Boltzmann machine; speech-music discrimination; Associative memory; Computer architecture; Feature extraction; Speech; Testing; Training; Vectors; Deep learning; Speech-Music Discrimination;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European

Conference_Location :

Lisbon

Type :

conf

Filename :

6952182

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=148467