DocumentCode :
155613
Title :
Inferring social contexts from audio recordings using deep neural networks
Author :
Asgari, M. ; Shafran, Izhak ; Bayestehtashk, Alireza
Author_Institution :
Center for Spoken Language Understanding, Oregon Health & Sci. Univ., Portland, OR, USA
fYear :
2014
fDate :
21-24 Sept. 2014
Firstpage :
1
Lastpage :
6
Abstract :
In this paper, we investigate the problem of detecting social contexts from the audio recordings of everyday life such as in life-logs. Unlike the standard corpora of telephone speech or broadcast news, these recordings have a wide variety of background noise. By nature, in such applications, it is difficult to collect and label all the representative noise for learning models in a fully supervised manner. The amount of labeled data that can be expected is relatively small compared to the available recordings. This lends itself naturally to unsupervised feature extraction using sparse auto-encoders, followed by supervised learning of a classifier for social contexts. We investigate different strategies for training these models and report results on a real-world application.
Keywords :
audio recording; audio signal processing; feature extraction; learning (artificial intelligence); neural nets; signal classification; audio recordings; background noise; classifier; deep neural networks; labeled data; learning models; life-logs; multilabel classification; representative noise; social contexts detection; sparse auto-encoders; supervised learning; unsupervised feature extraction; Accuracy; Context; Feature extraction; Harmonic analysis; Speech; Training; Vectors; Deep neural networks; Harmonic model; Multi-label classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning for Signal Processing (MLSP), 2014 IEEE International Workshop on
Conference_Location :
Reims
Type :
conf
DOI :
10.1109/MLSP.2014.6958853
Filename :
6958853
Link To Document :
بازگشت