DocumentCode
939704
Title
Combining Gaussianized/Non-Gaussianized Features to Improve Speaker Diarization of Telephone Conversations
Author
Gupta, Vishwa ; Kenny, Patrick ; Ouellet, Pierre ; Boulianne, Gilles ; Dumouchel, Pierre
Author_Institution
Centre de Recherche Inf. de Montreal, Montreal
Volume
14
Issue
12
fYear
2007
Firstpage
1040
Lastpage
1043
Abstract
We report results on speaker diarization of telephone conversations. This speaker diarization process is similar to the multistage segmentation and clustering system used in broadcast news. It consists of an initial acoustic change point detection algorithm, iterative Viterbi re-segmentation, gender labeling, agglomerative clustering using a Bayesian information criterion (BIC), followed by agglomerative clustering using state-of-the-art speaker identification (SID) methods and Viterbi re-segmentation using Gaussian mixture models (GMMs). We repeat these multistage segmentation and clustering steps twice: once with mel-frequency cepstral coefficients (MFCCs) as feature parameters for the GMMs used in gender labeling, SID, and Viterbi re-segmentation steps and another time with Gaussianized MFCCs as feature parameters for the GMMs used in these three steps. The resulting clusters from the parallel runs are combined in a novel way that leads to a significant reduction in the diarization error rate (DER). On a development set containing 30 telephone conversations, this combination step reduced the DER by 20%. On another test set containing 30 telephone conversations, this step reduced the DER by 13%. The best error rate we have achieved is 6.7% on the development set and 9.0% on the test set.
Keywords
Bayes methods; Gaussian processes; cepstral analysis; iterative methods; speaker recognition; Bayesian information criterion; Gaussian mixture models; acoustic change point detection algorithm; agglomerative clustering; broadcast news; clustering system; diarization error rate; gender labeling; iterative Viterbi re-segmentation; mel-frequency cepstral coefficients; multistage segmentation; nonGaussianized features; speaker diarization; speaker identification; telephone conversations; Bayesian information criterion (BIC) clustering; speaker diarization; speaker identification (SID) clustering; speaker segmentation and clustering;
fLanguage
English
Journal_Title
Signal Processing Letters, IEEE
Publisher
ieee
ISSN
1070-9908
Type
jour
DOI
10.1109/LSP.2007.905088
Filename
4358017
Link To Document