Robust HMM-based speech/music segmentation

Author

Ajmera, Jitendra ; McCowan, Iain A. ; Bourlard, Herve

Author_Institution

Daile Molle Institute for Perceptual Artificial Intelligence (IDIAP), P. O. Box 592, CH-1920 Martigny, Switzerland

Volume

1

fYear

2002

fDate

13-17 May 2002

Abstract

In this paper we present a new approach towards high performance speech/music segmentation on realistic tasks related to the automatic transcription of broadcast news. In the approach presented here, the local probability density function (PDF) estimators trained on clean microphone speech are used as a channel model at the output of which the entropy and “dynamism” will be measured and integrated over time through a 2-state (speech and and non-speech) hidden Markov model (HMM) with minimum duration constraints. The parameters of the HMM are trained using the EM algorithm in a completely unsupervised manner. Different experiments, including a variety of speech and music styles, as well as different segment durations of speech and music signals (real data distribution, mostly speech, or mostly music), will illustrate the robustness of the approach, which in each case achieves a frame-level accuracy greater than 94%.

Keywords

Acoustics; Entropy; Feature extraction; Hidden Markov models; Multiple signal classification; Robustness; Speech;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on

Conference_Location

Orlando, FL, USA

ISSN

1520-6149

Print_ISBN

0-7803-7402-9

Type

conf

DOI

10.1109/ICASSP.2002.5743713

Filename

5743713