• DocumentCode
    1652377
  • Title

    Automatic acoustic siren detection in traffic noise by part-based models

  • Author

    Schroder, Jochen ; Goetze, Stefan ; Grutzmacher, Volker ; Anemuller, Jorn

  • Author_Institution
    Hearing, Speech & Audio Technol., Fraunhofer IDMT, Oldenburg, Germany
  • fYear
    2013
  • Firstpage
    493
  • Lastpage
    497
  • Abstract
    State-of-the-art classifiers like hidden Markov models (HMMs) in combination with mel-frequency cepstral coefficients (MFCCs) are flexible in time but rigid in the spectral dimension. In contrast, part-based models (PBMs) originally proposed in computer vision consist of parts in a fully deformable configuration. The present contribution proposes to employ PBMs in the spectro-temporal domain for detection of emergency siren sounds in traffic noise,standard generative training resulting in a classifier that is robust to shifts in frequency induced, e.g., by Doppler-shift effects. Two improvements over standard machine learning techniques for PBM estimation are proposed: (i) Spectro-temporal part (“appearance”) extraction is initialized by interest point detection instead of random initialization and (ii) a discriminative training approach in addition to standard generative training is implemented. Evaluation with self-recorded police sirens and traffic noise gathered on-line demonstrates that PBMs are successful in acoustic siren detection. One hand-labeled and two machine learned PBMs are compared to standard HMMs employing mel-spectrograms and MFCCs in clean and multi condition (multiple SNR) training settings. Results show that in clean condition training, hand-labeled PBMs and HMMs outperform machine-learned PBMs already for test data with moderate additive noise. In multi condition training, the machine learned PBMs outperform HMMs on most SNRs, achieving high accuracies and being nearly optimal up to 5 dB SNR. Thus, our simulation results show that PBMs are a promising approach for acoustic event detection (AED).
  • Keywords
    acoustic signal detection; hidden Markov models; learning (artificial intelligence); signal classification; HMM; MFCC; acoustic event detection; automatic acoustic siren detection; computer vision; discriminative training approach; hidden Markov models; machine-learned PBM; mel-frequency cepstral coefficients; part-based models; self-recorded police sirens; spectro-temporal part extraction; standard generative training; standard machine learning techniques; state-of-the-art classifiers; traffic noise evaluation; Accuracy; Computational modeling; Hidden Markov models; Signal to noise ratio; Training; Vehicles; acoustic event detection (AED); part-based model (PBM); siren detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6637696
  • Filename
    6637696