DocumentCode :
3484412
Title :
Fast speaker diarization using a high-level scripting language
Author :
Gonina, Ekaterina ; Friedland, Gerald ; Cook, Henry ; Keutzer, Kurt
Author_Institution :
Univ. of California, Berkeley, CA, USA
fYear :
2011
fDate :
11-15 Dec. 2011
Firstpage :
553
Lastpage :
558
Abstract :
Most current speaker diarization systems use agglomerative clustering of Gaussian Mixture Models (GMMs) to determine “who spoke when” in an audio recording. While state-of-the-art in accuracy, this method is computationally costly, mostly due to the GMM training, and thus limits the performance of current approaches to be roughly real-time. Increased sizes of current datasets require processing of hundreds of hours of data and thus make more efficient processing methods highly desirable. With the emergence of highly parallel multicore and manycore processors, such as graphics processing units (GPUs), one can re-implement GMM training to achieve faster than real-time performance by taking advantage of parallelism in the training computation. However, developing and maintaining the complex low-level GPU code is difficult and requires a deep understanding of the hardware architecture of the parallel processor. Furthermore, such low-level implementations are not readily reusable in other applications and not portable to other platforms, limiting programmer productivity. In this paper we present a speaker diarization system captured in under 50 lines of Python that achieves 50-250× faster than real-time performance by using a specialization framework to automatically map and execute computationally intensive GMM training on an NVIDIA GPU, without significant loss in accuracy.
Keywords :
audio recording; authoring languages; speaker recognition; Gaussian mixture models; agglomerative clustering; audio recording; datasets; fast speaker diarization; high level scripting language; Covariance matrix; Graphics processing unit; Hardware; Instruction sets; Kernel; Real time systems; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
Conference_Location :
Waikoloa, HI
Print_ISBN :
978-1-4673-0365-1
Electronic_ISBN :
978-1-4673-0366-8
Type :
conf
DOI :
10.1109/ASRU.2011.6163887
Filename :
6163887
Link To Document :
بازگشت