مرکز منطقه ای اطلاع رساني علوم و فناوري - KL-HMM based speaker diarization system for meetings

DocumentCode :

730690

Title :

KL-HMM based speaker diarization system for meetings

Author :

Madikeri, Srikanth ; Bourlard, Herve

Author_Institution :

Idiap Res. Inst., Martigny, Switzerland

fYear :

2015

fDate :

19-24 April 2015

Firstpage :

4435

Lastpage :

4439

Abstract :

In this paper, the Kullback-Leibler Hidden Markov Model (KL-HMMs) is applied for unsupervised diarization of speech. A general approach to speaker diarization is to split the audio into uniform segments followed by one or more iterations of clustering of the segments and resegmentation of the audio. In the Information Bottlneck (IB) approach to diarization, short uniform segments are clustered using the IB criterion followed by resegmentation with KL-HMM. The KL-HMM approach has been shown to be an effective resegmentation procedure in this respect. Thus, the potential of KL-HMM as an independent diarization system is explored where the uniform segments are clustered and segmented using a sequence of posteriors obtained from the audio with respect to a Gaussian Mixture Model (GMM). The segmentation is performed using KL divergence, while the Jensen Shanon (JS) divergence is used for clustering. The diarization procedure is stopped by applying a Normalized Mutual Information (NMI) based criterion between two consecutive clustering outputs. The proposed method is tested on the NIST RT datasets. A best case relative improvement of 30% is observed in terms of Speaker Error Rate (SER) on the NIST RT 09 dataset when compared with the IB approach.

Keywords :

Gaussian processes; audio signal processing; hidden Markov models; mixture models; pattern clustering; speaker recognition; Gaussian mixture model; IB criterion; JS divergence; Jensen Shanon divergence; KL-HMM based speaker diarization system; Kullback-Leibler hidden Markov model; NMI; SER; audio resegmentation; clustering iteration; independent diarization system; information bottlneck approach; normalized mutual information; speaker error rate; Computational modeling; Hidden Markov models; Indium tin oxide; Hidden Markov Models; Kullback Leibler divergence; speaker diarization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location :

South Brisbane, QLD

Type :

conf

DOI :

10.1109/ICASSP.2015.7178809

Filename :

7178809

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=730690