Title :
Mahalanobis based emission model for speaker diarization of telephone conversations
Author :
Furmanov, Tal ; Aminov, Lidiya ; Moyal, Ami ; Lapidot, Itshak
Author_Institution :
Appl. Mater., Rehovot, Israel
Abstract :
The primary objective of any speaker diarization system is to designate speech segments to one of K speakers in the conversation. In this work we will focus on telephone conversations, where the number of speakers is given and equal 2. We use a hidden-distortion-model (HDM)-based system. HDM allows using different emission models as speaker models. The choice of adequate emission models, properly representing the data characteristics is important for the systems´ performance. We investigate the effect of several codebooks (CBs) based emission models, with Euclidian and Mahalanobis distances. The Mahalanobis distance was chosen due its potential to produce a better representation of the data´s spatial layout, while limitations where maid to retain the model from divergence. The influence of the different methods is evaluated using 108 telephone conversations taken from the LDC CallHome corpus. All the experiments achieved results poorer than the original SOM-based system (DER=12.70%).
Keywords :
mobile radio; principal component analysis; speaker recognition; Euclidian distances; HDM based system; K speakers; LDC CallHome corpus; Mahalanobis based emission model; Mahalanobis distances; data characteristics; hidden distortion-model; several codebooks; spatial layout; speaker diarization; speaker models; speech segment designation; telephone conversations; Covariance matrices; Density estimation robust algorithm; Hidden Markov models; Speech; Standards; Training; Vectors; Hidden-distortion model (HDM); K-means; Mahalanobis distance; self-organizing maps (SOM); speaker diarization;
Conference_Titel :
Electrical & Electronics Engineers in Israel (IEEEI), 2014 IEEE 28th Convention of
Conference_Location :
Eilat
Print_ISBN :
978-1-4799-5987-7
DOI :
10.1109/EEEI.2014.7005740