• DocumentCode
    730758
  • Title

    Combining SGMM speaker vectors and KL-HMM approach for speaker diarization

  • Author

    Madikeri, Srikanth ; Motlicek, Petr ; Bourlard, Herve

  • Author_Institution
    Idiap Res. Inst., Martigny, Switzerland
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4834
  • Lastpage
    4838
  • Abstract
    In this paper, a method to use SGMM speaker vectors for speaker diarization is introduced. The architecture of the Information Bottleneck (IB) based speaker diarization is utilized for this purpose. The audio for speaker diarization is split into short uniform segments. Speaker vectors are obtained from a Subspace Gaussian Mixture Model (SGMM) system trained on meeting data. The speaker vectors are clustered using the K-means algorithm. Two types of distance measures are explored in the clustering step: cosine distance of the speaker vectors and that of the vectors in a space projected by Probabilistic Linear Discriminant Analysis (PLDA). The clustering output is used as an initialization step for the Kullback Leibler-Hidden Markov Model (KL-HMM) based speech segmentation approach commonly used in the IB system for diarization. The proposed method is compared to clustering the segments using the IB based approach. A relative improvement of approximately 14% is obtained on the diarization performance for the proposed approach using SGMM speaker vectors with PLDA on the NIST RT 09 dataset.
  • Keywords
    Gaussian processes; hidden Markov models; mixture models; speaker recognition; K-means algorithm; KL-HMM approach; Kullback Leibler-hidden Markov model; NIST RT 09 dataset; PLDA; SGMM speaker vectors; cosine distance; distance measures; information bottleneck; probabilistic linear discriminant analysis; short uniform segments; speaker diarization; speech segmentation; subspace Gaussian mixture model; Clustering algorithms; Computational modeling; Computer architecture; Hidden Markov models; NIST; Speech; Speech processing; K-means; SGMM; speaker diarization; speaker vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178889
  • Filename
    7178889