DocumentCode :
730690
Title :
KL-HMM based speaker diarization system for meetings
Author :
Madikeri, Srikanth ; Bourlard, Herve
Author_Institution :
Idiap Res. Inst., Martigny, Switzerland
fYear :
2015
fDate :
19-24 April 2015
Firstpage :
4435
Lastpage :
4439
Abstract :
In this paper, the Kullback-Leibler Hidden Markov Model (KL-HMMs) is applied for unsupervised diarization of speech. A general approach to speaker diarization is to split the audio into uniform segments followed by one or more iterations of clustering of the segments and resegmentation of the audio. In the Information Bottlneck (IB) approach to diarization, short uniform segments are clustered using the IB criterion followed by resegmentation with KL-HMM. The KL-HMM approach has been shown to be an effective resegmentation procedure in this respect. Thus, the potential of KL-HMM as an independent diarization system is explored where the uniform segments are clustered and segmented using a sequence of posteriors obtained from the audio with respect to a Gaussian Mixture Model (GMM). The segmentation is performed using KL divergence, while the Jensen Shanon (JS) divergence is used for clustering. The diarization procedure is stopped by applying a Normalized Mutual Information (NMI) based criterion between two consecutive clustering outputs. The proposed method is tested on the NIST RT datasets. A best case relative improvement of 30% is observed in terms of Speaker Error Rate (SER) on the NIST RT 09 dataset when compared with the IB approach.
Keywords :
Gaussian processes; audio signal processing; hidden Markov models; mixture models; pattern clustering; speaker recognition; Gaussian mixture model; IB criterion; JS divergence; Jensen Shanon divergence; KL-HMM based speaker diarization system; Kullback-Leibler hidden Markov model; NMI; SER; audio resegmentation; clustering iteration; independent diarization system; information bottlneck approach; normalized mutual information; speaker error rate; Computational modeling; Hidden Markov models; Indium tin oxide; Hidden Markov Models; Kullback Leibler divergence; speaker diarization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
Type :
conf
DOI :
10.1109/ICASSP.2015.7178809
Filename :
7178809
Link To Document :
بازگشت