Environmental Sound Recognition With Time–Frequency Audio Features

Author

Chu, Selina ; Narayanan, Shrikanth ; Kuo, C. C Jay

Author_Institution

Dept. of Comput. Sci., Univ. of Southern California, Los Angeles, CA

Volume

17

Issue

6

fYear

2009

Firstpage

1142

Lastpage

1158

Abstract

The paper considers the task of recognizing environmental sounds for the understanding of a scene or context surrounding an audio sensor. A variety of features have been proposed for audio recognition, including the popular Mel-frequency cepstral coefficients (MFCCs) which describe the audio spectral shape. Environmental sounds, such as chirpings of insects and sounds of rain which are typically noise-like with a broad flat spectrum, may include strong temporal domain signatures. However, only few temporal-domain features have been developed to characterize such diverse audio signals previously. Here, we perform an empirical feature analysis for audio environment characterization and propose to use the matching pursuit (MP) algorithm to obtain effective time-frequency features. The MP-based method utilizes a dictionary of atoms for feature selection, resulting in a flexible, intuitive and physically interpretable set of features. The MP-based feature is adopted to supplement the MFCC features to yield higher recognition accuracy for environmental sounds. Extensive experiments are conducted to demonstrate the effectiveness of these joint features for unstructured environmental sound classification, including listening tests to study human recognition capabilities. Our recognition system has shown to produce comparable performance as human listeners.

Keywords

audio signal processing; pattern recognition; time-frequency analysis; Mel-frequency cepstral coefficients; audio sensor; broad flat spectrum; human recognition; matching pursuit algorithm; sound classification; sound recognition; temporal domain signatures; time-frequency audio features; Acoustic noise; Acoustic sensors; Cepstral analysis; Chirp; Humans; Insects; Layout; Matching pursuit algorithms; Rain; Spectral shape; Audio classification; Mel-frequency cepstral coefficient (MFCC); auditory scene recognition; data representation; feature extraction; feature selection; matching pursuit;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2009.2017438

Filename

5109766