Title :
Environmental Sound Recognition With Time–Frequency Audio Features
Author :
Chu, Selina ; Narayanan, Shrikanth ; Kuo, C. C Jay
Author_Institution :
Dept. of Comput. Sci., Univ. of Southern California, Los Angeles, CA
Abstract :
The paper considers the task of recognizing environmental sounds for the understanding of a scene or context surrounding an audio sensor. A variety of features have been proposed for audio recognition, including the popular Mel-frequency cepstral coefficients (MFCCs) which describe the audio spectral shape. Environmental sounds, such as chirpings of insects and sounds of rain which are typically noise-like with a broad flat spectrum, may include strong temporal domain signatures. However, only few temporal-domain features have been developed to characterize such diverse audio signals previously. Here, we perform an empirical feature analysis for audio environment characterization and propose to use the matching pursuit (MP) algorithm to obtain effective time-frequency features. The MP-based method utilizes a dictionary of atoms for feature selection, resulting in a flexible, intuitive and physically interpretable set of features. The MP-based feature is adopted to supplement the MFCC features to yield higher recognition accuracy for environmental sounds. Extensive experiments are conducted to demonstrate the effectiveness of these joint features for unstructured environmental sound classification, including listening tests to study human recognition capabilities. Our recognition system has shown to produce comparable performance as human listeners.
Keywords :
audio signal processing; pattern recognition; time-frequency analysis; Mel-frequency cepstral coefficients; audio sensor; broad flat spectrum; human recognition; matching pursuit algorithm; sound classification; sound recognition; temporal domain signatures; time-frequency audio features; Acoustic noise; Acoustic sensors; Cepstral analysis; Chirp; Humans; Insects; Layout; Matching pursuit algorithms; Rain; Spectral shape; Audio classification; Mel-frequency cepstral coefficient (MFCC); auditory scene recognition; data representation; feature extraction; feature selection; matching pursuit;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2009.2017438