DocumentCode
155613
Title
Inferring social contexts from audio recordings using deep neural networks
Author
Asgari, M. ; Shafran, Izhak ; Bayestehtashk, Alireza
Author_Institution
Center for Spoken Language Understanding, Oregon Health & Sci. Univ., Portland, OR, USA
fYear
2014
fDate
21-24 Sept. 2014
Firstpage
1
Lastpage
6
Abstract
In this paper, we investigate the problem of detecting social contexts from the audio recordings of everyday life such as in life-logs. Unlike the standard corpora of telephone speech or broadcast news, these recordings have a wide variety of background noise. By nature, in such applications, it is difficult to collect and label all the representative noise for learning models in a fully supervised manner. The amount of labeled data that can be expected is relatively small compared to the available recordings. This lends itself naturally to unsupervised feature extraction using sparse auto-encoders, followed by supervised learning of a classifier for social contexts. We investigate different strategies for training these models and report results on a real-world application.
Keywords
audio recording; audio signal processing; feature extraction; learning (artificial intelligence); neural nets; signal classification; audio recordings; background noise; classifier; deep neural networks; labeled data; learning models; life-logs; multilabel classification; representative noise; social contexts detection; sparse auto-encoders; supervised learning; unsupervised feature extraction; Accuracy; Context; Feature extraction; Harmonic analysis; Speech; Training; Vectors; Deep neural networks; Harmonic model; Multi-label classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning for Signal Processing (MLSP), 2014 IEEE International Workshop on
Conference_Location
Reims
Type
conf
DOI
10.1109/MLSP.2014.6958853
Filename
6958853
Link To Document