DocumentCode
1063190
Title
Prosodic and other Long-Term Features for Speaker Diarization
Author
Friedland, Gerald ; Vinyals, Oriol ; Huang, Yan ; Müller, Christian
Author_Institution
Int. Comput. Sci. Inst., Berkeley, CA
Volume
17
Issue
5
fYear
2009
fDate
7/1/2009 12:00:00 AM
Firstpage
985
Lastpage
993
Abstract
Speaker diarization is defined as the task of determining ldquowho spoke whenrdquo given an audio track and no other prior knowledge of any kind. The following article shows how a state-of-the-art speaker diarization system can be improved by combining traditional short-term features (MFCCs) with prosodic and other long-term features. First, we present a framework to study the speaker discriminability of 70 different long-term features. Then, we show how the top-ranked long-term features can be combined with short-term features to increase the accuracy of speaker diarization. The results were measured on standardized datasets (NIST RT) and show a consistent improvement of about 30% relative in diarization error rate compared to the best system presented at the NIST evaluation in 2007.
Keywords
audio signal processing; cepstral analysis; MFCC; audio track; long-term features; mel-frequency cepstral coefficients; speaker diarization; speaker discriminability; Cepstral analysis; Computer science; Density estimation robust algorithm; Error analysis; Mel frequency cepstral coefficient; NIST; Speaker recognition; Speech analysis; Speech processing; System testing; Long-term features; prosody; speaker diarization;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2009.2015089
Filename
5067417
Link To Document