مرکز منطقه ای اطلاع رساني علوم و فناوري - The ICSI RT-09 Speaker Diarization System

DocumentCode :

1531996

Title :

The ICSI RT-09 Speaker Diarization System

Author :

Friedland, Gerald ; Janin, Adam ; Imseng, David ; Miro, Xavier Anguera ; Gottlieb, Luke ; Huijbregts, Marijn ; Knox, Mary Tai ; Vinyals, Oriol

Author_Institution :

Int. Comput. Sci. Inst., Berkeley, CA, USA

Volume :

Issue :

fYear :

2012

Firstpage :

371

Lastpage :

381

Abstract :

The speaker diarization system developed at the International Computer Science Institute (ICSI) has played a prominent role in the speaker diarization community, and many researchers in the rich transcription community have adopted methods and techniques developed for the ICSI speaker diarization engine. Although there have been many related publications over the years, previous articles only presented changes and improvements rather than a description of the full system. Attempting to replicate the ICSI speaker diarization system as a complete entity would require an extensive literature review, and might ultimately fail due to component description version mismatches. This paper therefore presents the first full conceptual description of the ICSI speaker diarization system as presented to the National Institute of Standards Technology Rich Transcription 2009 (NIST RT-09) evaluation, which consists of online and offline subsystems, multi-stream and single-stream implementations, and audio and audio-visual approaches. Some of the components, such as the online system, have not been previously described. The paper also includes all necessary preprocessing steps, such as Wiener filtering, speech activity detection and beamforming.

Keywords :

speech processing; ICSI RT-09 speaker diarization system; International Computer Science Institute; NIST RT-09 evaluation; National Institute of Standards Technology Rich Transcription 2009 evaluation; Wiener filtering; beamforming; conceptual description; multistream implementation; offline subsystem; online subsystem; single-stream implementation; speech activity detection; Channel estimation; Data models; Delay; Hidden Markov models; Mel frequency cepstral coefficient; Microphones; Speech; Gaussian mixture models (GMMs); machine learning; speaker diarization;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2011.2158419

Filename :

5783332

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1531996