DocumentCode :
2770547
Title :
A fast-match approach for robust, faster than real-time speaker diarization
Author :
Huang, Yan ; Vinyals, Oriol ; Friedland, Gerald ; Müller, Christian ; Mirghafori, Nikki ; Wooters, Chuck
fYear :
2007
fDate :
9-13 Dec. 2007
Firstpage :
693
Lastpage :
698
Abstract :
During the past few years, speaker diarization has achieved satisfying accuracy in terms of speaker Diarization Error Rate (DER). The most successful approaches, based on agglomerative clustering, however, exhibit an inherent computational complexity which makes real-time processing, especially in combination with further processing steps, almost impossible. In this article we present a framework to speed up agglomerative clustering speaker diarization. The basic idea is to adopt a computationally cheap method to reduce the hypothesis space of the more expensive and accurate model selection via Bayesian Information Criterion (BIC). Two strategies based on the pitch-correlogram and the unscented-trans-form based approximation of KL-divergence are used independently as a fast-match approach to select the most likely clusters to merge. We performed the experiments using the existing ICSI speaker diarization system. The new system using KL-divergence fast-match strategy only performs 14% of total BIC comparisons needed in the baseline system, speeds up the system by 41% without affecting the speaker Diarization Error Rate (DER). The result is a robust and faster than real-time speaker diarization system.
Keywords :
Bayes methods; computational complexity; pattern clustering; speaker recognition; Bayesian information criterion; baseline system; computational complexity; diarization error rate; fast-match approach; index agglomerative clustering speaker diarization; model selection; real-time processing; real-time speaker diarization; Automatic speech recognition; Bayesian methods; Computer science; Density estimation robust algorithm; Error analysis; Iterative methods; Merging; Real time systems; Robustness; Runtime; BIC; KL-divergence; Speaker diarization; fast-match; pitch-correlogram;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-1746-9
Electronic_ISBN :
978-1-4244-1746-9
Type :
conf
DOI :
10.1109/ASRU.2007.4430196
Filename :
4430196
Link To Document :
بازگشت