مرکز منطقه ای اطلاع رساني علوم و فناوري - Fast speaker diarization using a high-level scripting language

DocumentCode :

3484412

Title :

Fast speaker diarization using a high-level scripting language

Author :

Gonina, Ekaterina ; Friedland, Gerald ; Cook, Henry ; Keutzer, Kurt

Author_Institution :

Univ. of California, Berkeley, CA, USA

fYear :

2011

fDate :

11-15 Dec. 2011

Firstpage :

553

Lastpage :

558

Abstract :

Most current speaker diarization systems use agglomerative clustering of Gaussian Mixture Models (GMMs) to determine “who spoke when” in an audio recording. While state-of-the-art in accuracy, this method is computationally costly, mostly due to the GMM training, and thus limits the performance of current approaches to be roughly real-time. Increased sizes of current datasets require processing of hundreds of hours of data and thus make more efficient processing methods highly desirable. With the emergence of highly parallel multicore and manycore processors, such as graphics processing units (GPUs), one can re-implement GMM training to achieve faster than real-time performance by taking advantage of parallelism in the training computation. However, developing and maintaining the complex low-level GPU code is difficult and requires a deep understanding of the hardware architecture of the parallel processor. Furthermore, such low-level implementations are not readily reusable in other applications and not portable to other platforms, limiting programmer productivity. In this paper we present a speaker diarization system captured in under 50 lines of Python that achieves 50-250× faster than real-time performance by using a specialization framework to automatically map and execute computationally intensive GMM training on an NVIDIA GPU, without significant loss in accuracy.

Keywords :

audio recording; authoring languages; speaker recognition; Gaussian mixture models; agglomerative clustering; audio recording; datasets; fast speaker diarization; high level scripting language; Covariance matrix; Graphics processing unit; Hardware; Instruction sets; Kernel; Real time systems; Training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on

Conference_Location :

Waikoloa, HI

Print_ISBN :

978-1-4673-0365-1

Electronic_ISBN :

978-1-4673-0366-8

Type :

conf

DOI :

10.1109/ASRU.2011.6163887

Filename :

6163887

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3484412