Title :
A generic classification system for multi-channel audio indexing: Application to speech and music detection
Author :
Benaroya, Elie-Laurent ; Peeters, G.
Author_Institution :
STMS IRCAM, Sound Anal./Synthesis Team, UPMC, Paris, France
Abstract :
There is a rise in the number 3D audio-visual productions and archives that creates a need for indexation of 3D contents. Event detection using audio modality is a difficult task. The standard way to do classification on 3D audio is to first down-mix to mono audio and classify on that. In this paper, we describe a generic classifier for multi-channel audio event detection and propose several information fusion strategies. Our system is evaluated on a speech and music detection task on the audio of 3D movies. We improve the classification performances on our database by 1.5% for speech detection, and 8% for music detection, compared to the standard downmixing method. We also provide a comparison of several information fusion methods in the experiments.
Keywords :
audio signal processing; indexing; sensor fusion; speech processing; three-dimensional television; 3D audio classification; 3D audio-visual productions; 3D contents; 3D movies; audio modality; generic classification system; information fusion; mono audio; multichannel audio event detection; multichannel audio indexing; music detection; speech detection; standard downmixing method; Feature extraction; Frequency measurement; MONOS devices; Motion pictures; Speech; Support vector machines; Three-dimensional displays;
Conference_Titel :
Image Analysis for Multimedia Interactive Services (WIAMIS), 2013 14th International Workshop on
Conference_Location :
Paris
DOI :
10.1109/WIAMIS.2013.6616160