• DocumentCode
    155613
  • Title

    Inferring social contexts from audio recordings using deep neural networks

  • Author

    Asgari, M. ; Shafran, Izhak ; Bayestehtashk, Alireza

  • Author_Institution
    Center for Spoken Language Understanding, Oregon Health & Sci. Univ., Portland, OR, USA
  • fYear
    2014
  • fDate
    21-24 Sept. 2014
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In this paper, we investigate the problem of detecting social contexts from the audio recordings of everyday life such as in life-logs. Unlike the standard corpora of telephone speech or broadcast news, these recordings have a wide variety of background noise. By nature, in such applications, it is difficult to collect and label all the representative noise for learning models in a fully supervised manner. The amount of labeled data that can be expected is relatively small compared to the available recordings. This lends itself naturally to unsupervised feature extraction using sparse auto-encoders, followed by supervised learning of a classifier for social contexts. We investigate different strategies for training these models and report results on a real-world application.
  • Keywords
    audio recording; audio signal processing; feature extraction; learning (artificial intelligence); neural nets; signal classification; audio recordings; background noise; classifier; deep neural networks; labeled data; learning models; life-logs; multilabel classification; representative noise; social contexts detection; sparse auto-encoders; supervised learning; unsupervised feature extraction; Accuracy; Context; Feature extraction; Harmonic analysis; Speech; Training; Vectors; Deep neural networks; Harmonic model; Multi-label classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning for Signal Processing (MLSP), 2014 IEEE International Workshop on
  • Conference_Location
    Reims
  • Type

    conf

  • DOI
    10.1109/MLSP.2014.6958853
  • Filename
    6958853