Title :
Speaker diarization and linking of large corpora
Author :
Ferras, Marc ; Bourlard, Herve
Author_Institution :
Idiap Res. Inst., Martigny, Switzerland
Abstract :
Performing speaker diarization of a collection of recordings, where speakers are uniquely identified across the database, is a challenging task. In this context, inter-session variability compensation and reasonable computation times are essential to be addressed. In this paper we propose a two-stage system composed of speaker diarization and speaker linking modules that are able to perform data set wide speaker diarization and that handle both large volumes of data and inter-session variability compensation. The speaker linking system agglomeratively clusters speaker factor posterior distributions, obtained within the Joint Factor Analysis framework, that model the speaker clusters output by a standard speaker diarization system. Therefore, the technique inherently compensates the channel variability effects from recording to recording within the database. A threshold is used to obtain meaningful speaker clusters by cutting the dendrogram obtained by the agglomerative clustering. We show how the Hotteling t-square statistic is an interesting distance measure for this task and input data, obtaining the best results and stability. The system is evaluated using three subsets of the AMI corpus involving different speaker and channel variabilities. We use the within-recording and across-recording diarization error rates (DER), cluster purity and cluster coverage to measure the performance of the proposed system. Across-recording DER as low as within-recording DER are obtained for some system setups.
Keywords :
pattern clustering; speaker recognition; statistics; AMI corpus; DER; Hotteling t-square statistic; across-recording diarization error rates; agglomerative clustering; channel variability; channel variability effects; cluster coverage; cluster purity; computation times; intersession variability compensation; joint factor analysis framework; large corpora linkage; speaker diarization; speaker factor posterior distribution; speaker linking modules; speaker variability; two-stage system; within-recording error rates; Adaptation models; Clustering algorithms; Data models; Density estimation robust algorithm; Joining processes; Speech; Vectors; agglomerative clustering; joint factor analysis; speaker diarization; speaker linking; ward method;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2012 IEEE
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4673-5125-6
Electronic_ISBN :
978-1-4673-5124-9
DOI :
10.1109/SLT.2012.6424236