• DocumentCode
    740086
  • Title

    Spectro-Temporal Gabor Filterbank Features for Acoustic Event Detection

  • Author

    Schroder, Jens ; Goetze, Stefan ; Anemuller, Jorn

  • Author_Institution
    Department Hearing, Speech and Audio Technology, Fraunhofer Institute for Digital Media Technology IDMT, Oldenburg, Germany
  • Volume
    23
  • Issue
    12
  • fYear
    2015
  • Firstpage
    2198
  • Lastpage
    2208
  • Abstract
    Algorithms for the automatic detection and recognition of acoustic events are increasingly gaining relevance for the reliable and robust functioning of consumer, assistive and monitoring systems. The extraction of appropriate task relevant acoustic features from the raw sound signal clearly influences performance of subsequent statistical classification, in particular in adverse acoustic situations. The present contribution investigates the use of biologically-inspired features, derived from a filterbank of two-dimensional Gabor functions, that decompose the spectro-temporal power density into components which capture spectral, temporal and joint spectro-temporal modulation patterns. It is hypothesized that the comparably large joint spectral and temporal extent of these Gabor functions results in features that allow for robust classification. Evaluation of the proposed feature extraction scheme together with an hidden Markov model (HMM) classifier is conducted on two corpora comprising acoustic events in realistic adverse conditions from the D-CASE and CLEAR’07 evaluation campaigns. Relevance of each Gabor filter for classification is analyzed and an optimized parameter set for the Gabor filterbank (GFB) is identified. Performance of the optimized GFB is evaluated in comparison to other state-of-the-art algorithms on isolated event classification and on the full acoustic event detection (AED) including joint classification and temporal segmentation of events. Results show that Gabor features result in a signal representation that exhibits separated average class-specific patterns. An improvement in classification accuracy of up to 26% relative to the Mel-frequency cepstral coefficient (MFCC) baseline is obtained with the optimized GFB. Further experiments demonstrate that this improvement cannot be explained by purely temporal or purely spectral Gabor basis functions. Rather, a GFB with features extending in joint spectro-temporal directions is required to- obtain optimum performance. Performance on AED with the D-CASE challenge dataset is shown to improve on previous algorithms from the recent literature.
  • Keywords
    Acoustics; Feature extraction; Frequency modulation; Hidden Markov models; Speech; Speech processing; Acoustic event detection (AED); Gabor filterbank; spectro-temporal filters;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2015.2467964
  • Filename
    7194756