DocumentCode :
1427806
Title :
Simultaneous Speech Detection With Spatial Features for Speaker Diarization
Author :
Zelenák, Martin ; Segura, Carlos ; Luque, Jordi ; Hernando, Javier
Author_Institution :
Dept. of Signal Theor. & Commun., Univ. Politec. de Catalunya, Barcelona, Spain
Volume :
20
Issue :
2
fYear :
2012
Firstpage :
436
Lastpage :
446
Abstract :
Simultaneous speech poses a challenging problem for conventional speaker diarization systems. In meeting data, a substantial amount of missed speech error is due to speaker overlaps, since usually only one speaker label per segment is assigned. Furthermore, simultaneous speech included in training data can lead to corrupt speaker models and thus worse segmentation performance. In this paper, we propose the use of three spatial cross-correlation-based features together with spectral information for speaker overlap detection on distant microphones. Different microphone-pair data are fused by means of principal component analysis. We have obtained an improvement of the speaker diarization system over the baseline by discarding overlap segments from model training and assigning two speaker labels to them according to likelihoods in Viterbi decoding. In experiments conducted on the AMI Meeting corpus, we achieve a relative DER reduction of 11.2% and 17.0% for single- and multi-site data, respectively. The improvement of clustering with techniques such as beamforming and TDOA-feature stream also leads to a higher effectiveness of the overlap labeling algorithm. Preliminary experiments with NIST RT data show DER improvement on the RT´09 meeting recordings as well.
Keywords :
Viterbi decoding; correlation methods; feature extraction; microphones; principal component analysis; speaker recognition; AMI meeting corpus; DER reduction; NIST RT data show; RT´09 meeting recording; TDOA-feature stream; Viterbi decoding; corrupt speaker model; distant microphone; microphone-pair data; model training; multisite data; overlap labeling algorithm; principal component analysis; simultaneous speech detection; spatial cross-correlation-based feature; speaker diarization system; speaker overlap detection; spectral information; speech missed error; Coherence; Feature extraction; Hidden Markov models; Microphones; Speech; Speech processing; Training; Cross-correlation; spatial features; speaker diarization; speaker overlap detection;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2011.2160167
Filename :
6136544
Link To Document :
بازگشت