Title :
Co-clustering for Auditory Scene Categorization
Author :
Cai, Rui ; Lu, Lie ; Hanjalic, Alan
Author_Institution :
Microsoft Res. Asia, Beijing
fDate :
6/1/2008 12:00:00 AM
Abstract :
Auditory scenes are temporal audio segments with coherent semantic content. Automatically classifying and grouping auditory scenes with similar semantics into categories is beneficial for many multimedia applications, such as semantic event detection and indexing. For such semantic categorization, auditory scenes are first characterized with either low-level acoustic features or some mid-level representations like audio effects, and then supervised classifiers or unsupervised clustering algorithms are employed to group scene segments into various semantic categories. In this paper, we focus on the problem of automatically categorizing audio scenes in unsupervised manner. To achieve more reasonable clustering results, we introduce the co-clustering scheme to exploit potential grouping trends among different dimensions of feature spaces (either low-level or mid-level feature spaces), and provide more accurate similarity measure for comparing auditory scenes. Moreover, we also extend the co-clustering scheme with a strategy based on the Bayesian information criterion (BIC) to automatically estimate the numbers of clusters. Evaluation performed on 272 auditory scenes extracted from 12-h audio data shows very encouraging categorization results. Co-clustering achieved a better performance compared to some traditional one-way clustering algorithms, both based on the low-level acoustic features and on the mid-level audio effect representations. Finally, we present our vision regarding the applicability of this approach on general multimedia data, and also show some preliminary results on content-based image clustering.
Keywords :
audio signal processing; belief networks; image classification; image segmentation; multimedia computing; pattern clustering; unsupervised learning; Bayesian information criterion; auditory scene categorization; co-clustering scheme; coherent semantic content; content-based image clustering; multimedia application; semantic event detection; semantic event indexing; temporal audio segment; unsupervised clustering algorithm; Bayesian methods; Clustering algorithms; Discrete Fourier transforms; Event detection; Extraterrestrial measurements; Indexing; Layout; Linear predictive coding; Mel frequency cepstral coefficient; Motion pictures; Audio content analysis; auditory scene categorization; co-clustering; local feature grouping trends;
Journal_Title :
Multimedia, IEEE Transactions on
DOI :
10.1109/TMM.2008.921739