• DocumentCode
    3420789
  • Title

    Detecting semantic concepts in consumer videos using audio

  • Author

    Junwei Liang ; Qin Jin ; Xixi He ; Gang Yang ; Jieping Xu ; Xirong Li

  • Author_Institution
    Multimedia Comput. Lab., Renmin Univ. of China, Beijing, China
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    2279
  • Lastpage
    2283
  • Abstract
    With the increasing use of audio sensors in user generated content collection, how to detect semantic concepts using audio streams has become an important research problem. In this paper, we present a semantic concept annotation system using soundtracks/ audio of the video. We investigate three different acoustic feature representations for audio semantic concept annotation and explore fusion of audio annotation with visual annotation systems. We test our system on the data collection from HUAWEI Accurate and Fast Mobile Video Annotation Grand Challenge 2014. The experimental results show that our audio-only concept annotation system can detect semantic concepts significantly better than random guess. It can also provide significant complementary information to the visual-based concept annotation system for performance boost. Further detailed analysis shows that for interpreting a semantic concept both visually and acoustically, it is better to train concept models for the visual system and audio system using visual-driven and audio-driven ground truth separately.
  • Keywords
    audio signal processing; audio streaming; feature extraction; image representation; sensor fusion; video signal processing; HUAWEI accurate and fast mobile video annotation grand challenge 2014; acoustic feature representation; audio annotation fusion; audio semantic concept annotation; audio sensor; audio streaming; audio-only concept annotation system; data collection; train concept model; user generated content collection; visual annotation system; visual-based concept annotation system; Audio systems; Feature extraction; Semantics; Speech; Videos; Visual systems; Visualization; Audio Concept Analysis; Semantic Concept Annotation; Video Content Analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178377
  • Filename
    7178377