DocumentCode
2931398
Title
A two phase method for general audio segmentation
Author
Zhang, Jessie Xin ; Whalley, Jacqueline ; Brooks, Stephen
Author_Institution
Sch. of Comput. & Math. Sci., Auckland Univ. of Technol., Auckland, New Zealand
fYear
2009
fDate
June 28 2009-July 3 2009
Firstpage
626
Lastpage
629
Abstract
This paper presents a model-free and training-free two-phase method for audio segmentation that separates monophonic heterogeneous audio files into acoustically homogeneous regions where each region contains a single sound. A rough segmentation separates audio input into audio clips based on silence detection in the time domain. Then a self-similarity matrix, based on selected audio features in the frequency domain to discover the level of similarity between frames in the audio clip, is calculated. Subsequently an edge detection method is used to find regions in the similarity image that determine plausible sounds in the audio clip. The results of the two phases are combined to form the final boundaries for the input audio. This two-phase method is evaluated using established methods and a standard non-musical database. The method reported here offers more accurate segmentation results than existing methods for audio segmentation. We propose that this approach could be adapted as an efficient preprocessing stage in other audio processing systems such as audio retrieval, classification, music analysis and summarization.
Keywords
audio databases; audio signal processing; content-based retrieval; music; audio classification; audio clips; audio processing systems; audio retrieval; edge detection method; general audio segmentation; monophonic heterogeneous audio files; music analysis; music summarization; self-similarity matrix; silence detection; standard nonmusical database; time domain; two phase method; Acoustic signal detection; Artificial neural networks; Audio databases; Computational Intelligence Society; Computer science; Hidden Markov models; Image edge detection; Mathematical model; Music information retrieval; Speech; Audio segmentation; edge detection; similarity map;
fLanguage
English
Publisher
ieee
Conference_Titel
Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on
Conference_Location
New York, NY
ISSN
1945-7871
Print_ISBN
978-1-4244-4290-4
Electronic_ISBN
1945-7871
Type
conf
DOI
10.1109/ICME.2009.5202574
Filename
5202574
Link To Document