• DocumentCode
    1653751
  • Title

    N-gram extension for bag-of-audio-words

  • Author

    Pancoast, Stephanie ; Akbacak, Murat

  • Author_Institution
    Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA
  • fYear
    2013
  • Firstpage
    778
  • Lastpage
    782
  • Abstract
    Bag-of-audio-words is one of the most frequently used methods for incorporating an audio component into multimedia event detection and related tasks. A main criticism of the method, however, is that it ignores context. Each “word” is considered in isolation, ignoring its neighbors. We address this issue by representing the document by its audio word N-grams. Unlike words from natural language, audio words are generated by clustering algorithms where the number of clusters is specified by the researcher. We therefore also explore how the performance of the N-gram representation varies with codebook size. With this enhanced representation, we find the average probability of miss noticeably decreases when evaluated on TRECVID 2011 and 2012 datasets, indicating clear improvements on the multimedia event detection task.
  • Keywords
    audio systems; codes; multimedia communication; pattern clustering; probability; TRECVID 2011 dataset; TRECVID 2012 dataset; audio word N-gram representation extension; average probability; bag-of-audio-word generation; clustering algorithm; codebook; document representation; multimedia event detection; natural language; Event detection; Histograms; Multimedia communication; NIST; Natural languages; Vectors; Videos; Bag-of-audio-words; N-gram models; multimedia event detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6637754
  • Filename
    6637754