• DocumentCode
    1489004
  • Title

    Time–Frequency Matrix Feature Extraction and Classification of Environmental Audio Signals

  • Author

    Ghoraani, Behnaz ; Krishnan, Sridhar

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Ryerson Univ., Toronto, ON, Canada
  • Volume
    19
  • Issue
    7
  • fYear
    2011
  • Firstpage
    2197
  • Lastpage
    2209
  • Abstract
    Audio feature extraction and classification are important tools for audio signal analysis in many applications, such as multimedia indexing and retrieval, and auditory scene analysis. However, due to the nonstationarities and discontinuities exist in these signals, their quantification and classification remains a formidable challenge. In this paper, we develop a new approach for audio feature extraction to effectively quantify these nonstationarities in an attempt to achieve high classification accuracy for environmental audio signals. Our approach consists of three stages: first we propose to construct the time-frequency matrix (TFM) of audio signals using matching-pursuit time-frequency distribution (MP-TFD) technique, and then apply the non-negative matrix decomposition (NMF) technique to decompose the TFM into its significant components. Finally, we propose seven novel features from the spectral and temporal structures of the decomposed vectors in a way that they successfully represent joint TF structure of the audio signal, and combine them with the Mel-frequency cepstral coefficients (MFCCs) features. These features are examined using a database of 192 environmental audio signals which includes 20 aircraft, 17 helicopter, 20 drum, 15 flute, 20 piano, 20 animal, 20 bird, and 20 insect sounds, and the speech of 20 males and 20 females. The results of the numerical simulation support the effectiveness of the proposed approach for environmental audio classification with over 10% accuracy-rate improvement compared to the MFCC features.
  • Keywords
    audio signal processing; feature extraction; iterative methods; signal classification; time-frequency analysis; audio signal analysis; auditory scene analysis; environmental audio signal classification; feature extraction; matching-pursuit; mel-frequency cepstral coefficients; multimedia indexing; multimedia retrieval; nonnegative matrix decomposition; time-frequency distribution; time-frequency matrix; Dictionaries; Equations; Feature extraction; Joints; Matrix decomposition; Speech; Time frequency analysis; Environmental audio classification; matching pursuit time–frequency distribution; non-negative matrix factorization (NMF); time–frequency matrix feature extraction; time–frequency quantification;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2011.2118753
  • Filename
    5742784