DocumentCode :
417247
Title :
Advances in unsupervised audio segmentation for the broadcast news and NGSW corpora
Author :
Huang, Rongqing ; Hansen, John H L
Author_Institution :
Center for Spoken Language Res., Univ. of Colorado, Boulder, CO, USA
Volume :
1
fYear :
2004
fDate :
17-21 May 2004
Abstract :
The problem of unsupervised audio segmentation continues to be a challenging research problem which significantly impacts automatic speech recognition (ASR) and spoken document retrieval (SDR) performance. This paper addresses novel advances in audio segmentation for unsupervised multi-speaker change detection. First, we investigate new features which are intended to be more appropriate for segmentation that include: PMVDR (perceptual minimum variance distortionless response), SZCR ( smoothed zero crossing rate), and FBLC (filterbank log coefficients); next we consider a new distance metric, T2-mean which is intended to improve segmentation for short segments (<5s). A novel false alarm compensation procedure is also developed and used after the segmentation phase. We establish a more effective evaluation procedure for segmentation versus the more traditional EER and frame accuracy approaches. Employing these advances within our new scheme, results in more than a 30% improvement in segmentation performance using the 3-hour Hub4 broadcast news 1997 evaluation data. Evaluations are also presented for audio from the NGSW corpus.
Keywords :
information retrieval; smoothing methods; speech recognition; ASR; FBLC; Hub4 broadcast news 1997; NGSW corpora; PMVDR; SDR performance; SZCR; T2-mean; automatic speech recognition; distance metric; false alarm compensation; filterbank log coefficients; multi-speaker change detection; perceptual minimum variance distortionless response; smoothed zero crossing rate; spoken document retrieval; unsupervised audio segmentation; Acoustic distortion; Automatic speech recognition; Bayesian methods; Broadcasting; Loudspeakers; Mel frequency cepstral coefficient; Natural languages; Robustness; Speech processing; Streaming media;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-8484-9
Type :
conf
DOI :
10.1109/ICASSP.2004.1326092
Filename :
1326092
Link To Document :
بازگشت