• DocumentCode
    2931398
  • Title

    A two phase method for general audio segmentation

  • Author

    Zhang, Jessie Xin ; Whalley, Jacqueline ; Brooks, Stephen

  • Author_Institution
    Sch. of Comput. & Math. Sci., Auckland Univ. of Technol., Auckland, New Zealand
  • fYear
    2009
  • fDate
    June 28 2009-July 3 2009
  • Firstpage
    626
  • Lastpage
    629
  • Abstract
    This paper presents a model-free and training-free two-phase method for audio segmentation that separates monophonic heterogeneous audio files into acoustically homogeneous regions where each region contains a single sound. A rough segmentation separates audio input into audio clips based on silence detection in the time domain. Then a self-similarity matrix, based on selected audio features in the frequency domain to discover the level of similarity between frames in the audio clip, is calculated. Subsequently an edge detection method is used to find regions in the similarity image that determine plausible sounds in the audio clip. The results of the two phases are combined to form the final boundaries for the input audio. This two-phase method is evaluated using established methods and a standard non-musical database. The method reported here offers more accurate segmentation results than existing methods for audio segmentation. We propose that this approach could be adapted as an efficient preprocessing stage in other audio processing systems such as audio retrieval, classification, music analysis and summarization.
  • Keywords
    audio databases; audio signal processing; content-based retrieval; music; audio classification; audio clips; audio processing systems; audio retrieval; edge detection method; general audio segmentation; monophonic heterogeneous audio files; music analysis; music summarization; self-similarity matrix; silence detection; standard nonmusical database; time domain; two phase method; Acoustic signal detection; Artificial neural networks; Audio databases; Computational Intelligence Society; Computer science; Hidden Markov models; Image edge detection; Mathematical model; Music information retrieval; Speech; Audio segmentation; edge detection; similarity map;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on
  • Conference_Location
    New York, NY
  • ISSN
    1945-7871
  • Print_ISBN
    978-1-4244-4290-4
  • Electronic_ISBN
    1945-7871
  • Type

    conf

  • DOI
    10.1109/ICME.2009.5202574
  • Filename
    5202574