Title :
Multistage speaker diarization of broadcast news
Author :
Barras, Claude ; Zhu, Xuan ; Meignier, Sylvain ; Gauvain, Jean-Luc
Author_Institution :
Eng. Sci.-Nat. Center for Sci. Res., LIMSI-CNRS, Orsay
Abstract :
This paper describes recent advances in speaker diarization with a multistage segmentation and clustering system, which incorporates a speaker identification step. This system builds upon the baseline audio partitioner used in the LIMSI broadcast news transcription system. The baseline partitioner provides a high cluster purity, but has a tendency to split data from speakers with a large quantity of data into several segment clusters. Several improvements to the baseline system have been made. First, the iterative Gaussian mixture model (GMM) clustering has been replaced by a Bayesian information criterion (BIC) agglomerative clustering. Second, an additional clustering stage has been added, using a GMM-based speaker identification method. Finally, a post-processing stage refines the segment boundaries using the output of a transcription system. On the National Institute of Standards and Technology (NIST) RT-04F and ESTER evaluation data, the multistage system reduces the speaker error by over 70% relative to the baseline system, and gives between 40% and 50% reduction relative to a single-stage BIC clustering system
Keywords :
Bayes methods; Gaussian processes; broadcasting; iterative methods; pattern clustering; speaker recognition; Bayesian information criterion agglomerative clustering; ESTER evaluation data; LIMSI broadcast news transcription system; National Institute of Standards and Technology RT-04F; baseline audio partitioner; clustering system; high cluster purity; iterative Gaussian mixture model clustering; multistage segmentation; multistage speaker diarization; segment boundaries; speaker error reduction; speaker identification; split data; Background noise; Bayesian methods; Broadcasting; Computer errors; Indexing; Laboratories; Loudspeakers; NIST; Speech processing; Streaming media; Bayesian information criterion (BIC) clustering; speaker diarization; speaker identification (SID); speaker segmentation and clustering;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2006.878261