DocumentCode :
939704
Title :
Combining Gaussianized/Non-Gaussianized Features to Improve Speaker Diarization of Telephone Conversations
Author :
Gupta, Vishwa ; Kenny, Patrick ; Ouellet, Pierre ; Boulianne, Gilles ; Dumouchel, Pierre
Author_Institution :
Centre de Recherche Inf. de Montreal, Montreal
Volume :
14
Issue :
12
fYear :
2007
Firstpage :
1040
Lastpage :
1043
Abstract :
We report results on speaker diarization of telephone conversations. This speaker diarization process is similar to the multistage segmentation and clustering system used in broadcast news. It consists of an initial acoustic change point detection algorithm, iterative Viterbi re-segmentation, gender labeling, agglomerative clustering using a Bayesian information criterion (BIC), followed by agglomerative clustering using state-of-the-art speaker identification (SID) methods and Viterbi re-segmentation using Gaussian mixture models (GMMs). We repeat these multistage segmentation and clustering steps twice: once with mel-frequency cepstral coefficients (MFCCs) as feature parameters for the GMMs used in gender labeling, SID, and Viterbi re-segmentation steps and another time with Gaussianized MFCCs as feature parameters for the GMMs used in these three steps. The resulting clusters from the parallel runs are combined in a novel way that leads to a significant reduction in the diarization error rate (DER). On a development set containing 30 telephone conversations, this combination step reduced the DER by 20%. On another test set containing 30 telephone conversations, this step reduced the DER by 13%. The best error rate we have achieved is 6.7% on the development set and 9.0% on the test set.
Keywords :
Bayes methods; Gaussian processes; cepstral analysis; iterative methods; speaker recognition; Bayesian information criterion; Gaussian mixture models; acoustic change point detection algorithm; agglomerative clustering; broadcast news; clustering system; diarization error rate; gender labeling; iterative Viterbi re-segmentation; mel-frequency cepstral coefficients; multistage segmentation; nonGaussianized features; speaker diarization; speaker identification; telephone conversations; Bayesian information criterion (BIC) clustering; speaker diarization; speaker identification (SID) clustering; speaker segmentation and clustering;
fLanguage :
English
Journal_Title :
Signal Processing Letters, IEEE
Publisher :
ieee
ISSN :
1070-9908
Type :
jour
DOI :
10.1109/LSP.2007.905088
Filename :
4358017
Link To Document :
بازگشت