• DocumentCode
    3370908
  • Title

    An i-Vector Representation of Acoustic Environments for Audio-Based Video Event Detection on User Generated Content

  • Author

    Elizalde, Benjamin ; Lei, Haozhen ; Friedland, Gerald

  • Author_Institution
    Int. Comput. Sci. Inst., Berkeley, CA, USA
  • fYear
    2013
  • fDate
    9-11 Dec. 2013
  • Firstpage
    114
  • Lastpage
    117
  • Abstract
    Audio-based video event detection (VED) on user-generated content (UGC) aims to find videos that show an observable event such as a wedding ceremony or birthday party rather than a sound, such as music, clapping or singing. The difficulty of video content analysis on UGC lies in the acoustic variability and lack of structure of the data. The UGC task has been explored mainly by computer vision, but can be benefited by the used of audio. The i-vector system is state-of-the-art in Speaker Verification, and is outperforming a conventional Gaussian Mixture Model (GMM)-based approach. The system compensates for undesired acoustic variability and extracts information from the acoustic environment, making it a meaningful choice for detection on UGC. This paper employs the i-vector-based system for audio-based VED on UGC and expands the understanding of the system on the task. It also includes a performance comparison with the conventional GMM-based and state-of-the-art Random Forest (RF)-based systems. The i-vector system aids audio-based event detection by addressing UGC audio characteristics. It outperforms the GMM-based system, and is competitive with the RF-based system in terms of the Missed Detection (MD) rate at 4% and 2.8% False Alarm (FA) rates, and complements the RF-based system by demonstrating slightly improvement in combination over the standalone systems.
  • Keywords
    Gaussian processes; audio signal processing; image representation; mixture models; object detection; speaker recognition; video signal processing; FA rates; GMM-based approach; MD rate; RF-based systems; UGC audio characteristics; acoustic environments; acoustic variability; audio-based VED; audio-based event detection; audio-based video event detection; birthday party; computer vision; conventional GMM-based systems; conventional Gaussian mixture model; data structure; false alarm rates; i-vector representation; i-vector system; i-vector-based system; information extraction; missed detection rate; observable event; random forest-based systems; speaker verification; standalone system; user generated content; video content analysis; wedding ceremony; Acoustics; Event detection; Multimedia communication; Radio frequency; Streaming media; Training; Vectors; Audio; User Generated Content; Video Event Detection; i-vector;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia (ISM), 2013 IEEE International Symposium on
  • Conference_Location
    Anaheim, CA
  • Print_ISBN
    978-0-7695-5140-1
  • Type

    conf

  • DOI
    10.1109/ISM.2013.27
  • Filename
    6746778