• DocumentCode
    900452
  • Title

    A flexible framework for key audio effects detection and auditory context inference

  • Author

    Cai, Lian-Hong ; Lu, Lie ; Hanjalic, Alan ; Zhang, Hong-Jiang ; Lian-Hong Cai

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • Volume
    14
  • Issue
    3
  • fYear
    2006
  • fDate
    5/1/2006 12:00:00 AM
  • Firstpage
    1026
  • Lastpage
    1039
  • Abstract
    Key audio effects are those special effects that play critical roles in human´s perception of an auditory context in audiovisual materials. Based on key audio effects, high-level semantic inference can be carried out to facilitate various content-based analysis applications, such as highlight extraction and video summarization. In this paper, a flexible framework is proposed for key audio effect detection in a continuous audio stream, as well as for the semantic inference of an auditory context. In the proposed framework, key audio effects and the background sounds are comprehensively modeled with hidden Markov models, and a Grammar Network is proposed to connect various models to fully explore the transitions among them. Moreover, a set of new spectral features are employed to improve the representation of each audio effect and the discrimination among various effects. The framework is convenient to add or remove target audio effects in various applications. Based on the obtained key effect sequence, a Bayesian network-based approach is proposed to further discover the high-level semantics of an auditory context by integrating prior knowledge and statistical learning. Evaluations on 12 h of audio data indicate that the proposed framework can achieve satisfying results, both on key audio effect detection and auditory context inference.
  • Keywords
    audio signal processing; belief networks; hidden Markov models; inference mechanisms; statistical analysis; Bayesian network-based approach; auditory context inference; continuous audio stream; grammar network; hidden Markov models; key audio effects detection; semantic inference; statistical learning; Acoustic noise; Asia; Bayesian methods; Context modeling; Hidden Markov models; Layout; Motion pictures; Speech; Statistical learning; Streaming media; Audio content analysis; Bayesian network; auditory context; flexible framework; grammar network; key audio effect; multi-background model;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TSA.2005.857575
  • Filename
    1621215