• DocumentCode
    939704
  • Title

    Combining Gaussianized/Non-Gaussianized Features to Improve Speaker Diarization of Telephone Conversations

  • Author

    Gupta, Vishwa ; Kenny, Patrick ; Ouellet, Pierre ; Boulianne, Gilles ; Dumouchel, Pierre

  • Author_Institution
    Centre de Recherche Inf. de Montreal, Montreal
  • Volume
    14
  • Issue
    12
  • fYear
    2007
  • Firstpage
    1040
  • Lastpage
    1043
  • Abstract
    We report results on speaker diarization of telephone conversations. This speaker diarization process is similar to the multistage segmentation and clustering system used in broadcast news. It consists of an initial acoustic change point detection algorithm, iterative Viterbi re-segmentation, gender labeling, agglomerative clustering using a Bayesian information criterion (BIC), followed by agglomerative clustering using state-of-the-art speaker identification (SID) methods and Viterbi re-segmentation using Gaussian mixture models (GMMs). We repeat these multistage segmentation and clustering steps twice: once with mel-frequency cepstral coefficients (MFCCs) as feature parameters for the GMMs used in gender labeling, SID, and Viterbi re-segmentation steps and another time with Gaussianized MFCCs as feature parameters for the GMMs used in these three steps. The resulting clusters from the parallel runs are combined in a novel way that leads to a significant reduction in the diarization error rate (DER). On a development set containing 30 telephone conversations, this combination step reduced the DER by 20%. On another test set containing 30 telephone conversations, this step reduced the DER by 13%. The best error rate we have achieved is 6.7% on the development set and 9.0% on the test set.
  • Keywords
    Bayes methods; Gaussian processes; cepstral analysis; iterative methods; speaker recognition; Bayesian information criterion; Gaussian mixture models; acoustic change point detection algorithm; agglomerative clustering; broadcast news; clustering system; diarization error rate; gender labeling; iterative Viterbi re-segmentation; mel-frequency cepstral coefficients; multistage segmentation; nonGaussianized features; speaker diarization; speaker identification; telephone conversations; Bayesian information criterion (BIC) clustering; speaker diarization; speaker identification (SID) clustering; speaker segmentation and clustering;
  • fLanguage
    English
  • Journal_Title
    Signal Processing Letters, IEEE
  • Publisher
    ieee
  • ISSN
    1070-9908
  • Type

    jour

  • DOI
    10.1109/LSP.2007.905088
  • Filename
    4358017