Advances in unsupervised audio segmentation for the broadcast news and NGSW corpora

Author

Huang, Rongqing ; Hansen, John H L

Author_Institution

Center for Spoken Language Res., Univ. of Colorado, Boulder, CO, USA

Volume

1

fYear

2004

fDate

17-21 May 2004

Abstract

The problem of unsupervised audio segmentation continues to be a challenging research problem which significantly impacts automatic speech recognition (ASR) and spoken document retrieval (SDR) performance. This paper addresses novel advances in audio segmentation for unsupervised multi-speaker change detection. First, we investigate new features which are intended to be more appropriate for segmentation that include: PMVDR (perceptual minimum variance distortionless response), SZCR ( smoothed zero crossing rate), and FBLC (filterbank log coefficients); next we consider a new distance metric, T²-mean which is intended to improve segmentation for short segments (<5s). A novel false alarm compensation procedure is also developed and used after the segmentation phase. We establish a more effective evaluation procedure for segmentation versus the more traditional EER and frame accuracy approaches. Employing these advances within our new scheme, results in more than a 30% improvement in segmentation performance using the 3-hour Hub4 broadcast news 1997 evaluation data. Evaluations are also presented for audio from the NGSW corpus.

Keywords

information retrieval; smoothing methods; speech recognition; ASR; FBLC; Hub4 broadcast news 1997; NGSW corpora; PMVDR; SDR performance; SZCR; T²-mean; automatic speech recognition; distance metric; false alarm compensation; filterbank log coefficients; multi-speaker change detection; perceptual minimum variance distortionless response; smoothed zero crossing rate; spoken document retrieval; unsupervised audio segmentation; Acoustic distortion; Automatic speech recognition; Bayesian methods; Broadcasting; Loudspeakers; Mel frequency cepstral coefficient; Natural languages; Robustness; Speech processing; Streaming media;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-8484-9

Type

conf

DOI

10.1109/ICASSP.2004.1326092

Filename

1326092