DocumentCode
1096921
Title
Acoustic Beamforming for Speaker Diarization of Meetings
Author
Anguera, Xavier ; Wooters, Chuck ; Hernando, Javier
Author_Institution
Int. Comput. Sci. Inst., Berkeley
Volume
15
Issue
7
fYear
2007
Firstpage
2011
Lastpage
2022
Abstract
When performing speaker diarization on recordings from meetings, multiple microphones of different qualities are usually available and distributed around the meeting room. Although several approaches have been proposed in recent years to take advantage of multiple microphones, they are either too computationally expensive and not easily scalable or they cannot outperform the simpler case of using the best single microphone. In this paper, the use of classic acoustic beamforming techniques is proposed together with several novel algorithms to create a complete frontend for speaker diarization in the meeting room domain. New techniques we are presenting include blind reference-channel selection, two-step time delay of arrival (TDOA) Viterbi postprocessing, and a dynamic output signal weighting algorithm, together with using such TDOA values in the diarization to complement the acoustic information. Tests on speaker diarization show a 25% relative improvement on the test set compared to using a single most centrally located microphone. Additional experimental results show improvements using these techniques in a speech recognition task.
Keywords
Viterbi detection; acoustic signal processing; microphones; speaker recognition; time-of-arrival estimation; TDOA; Viterbi postprocessing; acoustic beamforming; blind reference-channel selection; dynamic output signal weighting algorithm; meeting room domain; meetings; multiple microphones; speaker diarization; speech recognition; two-step time delay of arrival; Acoustic testing; Ambient intelligence; Array signal processing; Associate members; Computer science; Loudspeakers; Microphone arrays; Signal processing; Speech; Switches; Acoustic beamforming; meeting processing; speaker diarization; speaker segmentation and clustering;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2007.902460
Filename
4291588
Link To Document