DocumentCode :
2770573
Title :
Multiple feature combination to improve speaker diarization of telephone conversations
Author :
Gupta, Vishwa ; Kenny, Patrick ; Ouellet, Pierre ; Boulianne, Gilles ; Dumouchel, Pierre
Author_Institution :
Centre de recherche informatique de Montreal, Montreal
fYear :
2007
fDate :
9-13 Dec. 2007
Firstpage :
705
Lastpage :
710
Abstract :
We report results on speaker diarization of telephone conversations. This speaker diarization process is similar to the multistage segmentation and clustering system used in broadcast news. It consists of an initial acoustic change point detection algorithm, iterative Viterbi re-segmentation, gender labeling, agglomerative clustering using a Bayesian information criterion (BIC), followed by agglomerative clustering using state-of-the-art speaker identification methods (SID) and Viterbi re-segmentation using Gaussian mixture models (GMMs). The Viterbi re-segmentation using GMMs is new, and it reduces the diarization error rate (DER) by 10%. We repeat these multistage segmentation and clustering steps twice: once with MFCCs as feature parameters for the GMMs used in gender labeling, SID and Viterbi re-segmentation steps, and another time with Gaussianized MFCCs as feature parameters for the GMMs used in these three steps. The resulting clusters from the parallel runs are combined in a novel way that leads to a significant reduction in the DER. On a development set containing 30 telephone conversations, this combination step reduced the DER by 20%. On another test set containing 30 telephone conversations, this step reduced the DER by 13%. The best error rate we have achieved is 6.7% on the development set, and 9.0% on the test set.
Keywords :
Bayes methods; Gaussian processes; error statistics; feature extraction; gender issues; iterative methods; pattern clustering; speaker recognition; Bayesian information criterion; Gaussian mixture models; acoustic change point detection algorithm; agglomerative clustering; broadcast news; diarization error rate; gender labeling; iterative Viterbi re-segmentation; multiple feature combination; multistage segmentation-clustering system; speaker diarization process; state-of-the-art speaker identification methods; telephone conversations; Broadcasting; Density estimation robust algorithm; Detection algorithms; Error analysis; Iterative methods; Labeling; Loudspeakers; Telephony; Testing; Viterbi algorithm; BIC clustering; SID clustering; speaker diarization; speaker segmentation and clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-1746-9
Electronic_ISBN :
978-1-4244-1746-9
Type :
conf
DOI :
10.1109/ASRU.2007.4430198
Filename :
4430198
Link To Document :
بازگشت