• DocumentCode
    2770547
  • Title

    A fast-match approach for robust, faster than real-time speaker diarization

  • Author

    Huang, Yan ; Vinyals, Oriol ; Friedland, Gerald ; Müller, Christian ; Mirghafori, Nikki ; Wooters, Chuck

  • fYear
    2007
  • fDate
    9-13 Dec. 2007
  • Firstpage
    693
  • Lastpage
    698
  • Abstract
    During the past few years, speaker diarization has achieved satisfying accuracy in terms of speaker Diarization Error Rate (DER). The most successful approaches, based on agglomerative clustering, however, exhibit an inherent computational complexity which makes real-time processing, especially in combination with further processing steps, almost impossible. In this article we present a framework to speed up agglomerative clustering speaker diarization. The basic idea is to adopt a computationally cheap method to reduce the hypothesis space of the more expensive and accurate model selection via Bayesian Information Criterion (BIC). Two strategies based on the pitch-correlogram and the unscented-trans-form based approximation of KL-divergence are used independently as a fast-match approach to select the most likely clusters to merge. We performed the experiments using the existing ICSI speaker diarization system. The new system using KL-divergence fast-match strategy only performs 14% of total BIC comparisons needed in the baseline system, speeds up the system by 41% without affecting the speaker Diarization Error Rate (DER). The result is a robust and faster than real-time speaker diarization system.
  • Keywords
    Bayes methods; computational complexity; pattern clustering; speaker recognition; Bayesian information criterion; baseline system; computational complexity; diarization error rate; fast-match approach; index agglomerative clustering speaker diarization; model selection; real-time processing; real-time speaker diarization; Automatic speech recognition; Bayesian methods; Computer science; Density estimation robust algorithm; Error analysis; Iterative methods; Merging; Real time systems; Robustness; Runtime; BIC; KL-divergence; Speaker diarization; fast-match; pitch-correlogram;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
  • Conference_Location
    Kyoto
  • Print_ISBN
    978-1-4244-1746-9
  • Electronic_ISBN
    978-1-4244-1746-9
  • Type

    conf

  • DOI
    10.1109/ASRU.2007.4430196
  • Filename
    4430196