• DocumentCode
    784477
  • Title

    Text-Like Segmentation of General Audio for Content-Based Retrieval

  • Author

    Lu, Lie ; Hanjalic, Alan

  • Author_Institution
    Microsoft Res. Asia, Beijing
  • Volume
    11
  • Issue
    4
  • fYear
    2009
  • fDate
    6/1/2009 12:00:00 AM
  • Firstpage
    658
  • Lastpage
    669
  • Abstract
    Automatic detection of (semantically) meaningful audio segments, or audio scenes, is an important step in high-level semantic inference from general audio signals, and can benefit various content-based applications involving both audio and multimodal (multimedia) data sets. Motivated by the known limitations of traditional low-level feature-based approaches, we propose in this paper a novel approach to discover audio scenes, based on an analysis of audio elements and key audio elements, which can be seen as equivalents to the words and keywords in a text document, respectively. In the proposed approach, an audio track is seen as a sequence of audio elements, and the presence of an audio scene boundary at a given time stamp is checked based on pair-wise measuring the semantic affinity between different parts of the analyzed audio stream surrounding that time stamp. Our proposed model for semantic affinity exploits the proven concepts from text document analysis, and is introduced here as a function of the distance between the audio parts considered, and the co-occurrence statistics and the importance weights of the audio elements contained therein. Experimental evaluation performed on a representative data set consisting of 5 h of diverse audio data streams indicated that the proposed approach is more effective than the traditional low-level feature-based approaches in solving the posed audio scene segmentation problem.
  • Keywords
    content-based retrieval; multimedia systems; text analysis; audio elements analysis; automatic detection; content-based retrieval; cooccurrence statistics; high-level semantic inference; key audio elements analysis; posed audio scene segmentation problem; semantic affinity; text-like segmentation; Audio element; audio scene; audio scene segmentation; content-based audio analysis;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2009.2017607
  • Filename
    4895314