DocumentCode
417247
Title
Advances in unsupervised audio segmentation for the broadcast news and NGSW corpora
Author
Huang, Rongqing ; Hansen, John H L
Author_Institution
Center for Spoken Language Res., Univ. of Colorado, Boulder, CO, USA
Volume
1
fYear
2004
fDate
17-21 May 2004
Abstract
The problem of unsupervised audio segmentation continues to be a challenging research problem which significantly impacts automatic speech recognition (ASR) and spoken document retrieval (SDR) performance. This paper addresses novel advances in audio segmentation for unsupervised multi-speaker change detection. First, we investigate new features which are intended to be more appropriate for segmentation that include: PMVDR (perceptual minimum variance distortionless response), SZCR ( smoothed zero crossing rate), and FBLC (filterbank log coefficients); next we consider a new distance metric, T2-mean which is intended to improve segmentation for short segments (<5s). A novel false alarm compensation procedure is also developed and used after the segmentation phase. We establish a more effective evaluation procedure for segmentation versus the more traditional EER and frame accuracy approaches. Employing these advances within our new scheme, results in more than a 30% improvement in segmentation performance using the 3-hour Hub4 broadcast news 1997 evaluation data. Evaluations are also presented for audio from the NGSW corpus.
Keywords
information retrieval; smoothing methods; speech recognition; ASR; FBLC; Hub4 broadcast news 1997; NGSW corpora; PMVDR; SDR performance; SZCR; T2-mean; automatic speech recognition; distance metric; false alarm compensation; filterbank log coefficients; multi-speaker change detection; perceptual minimum variance distortionless response; smoothed zero crossing rate; spoken document retrieval; unsupervised audio segmentation; Acoustic distortion; Automatic speech recognition; Bayesian methods; Broadcasting; Loudspeakers; Mel frequency cepstral coefficient; Natural languages; Robustness; Speech processing; Streaming media;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-8484-9
Type
conf
DOI
10.1109/ICASSP.2004.1326092
Filename
1326092
Link To Document