Simultaneous multispeaker segmentation for automatic meeting recognition

Author

Laskowski, Kornel ; Fugen, Christian ; Schultz, Tanja

Author_Institution

interACT, Univ. Karlsruhe, Karlsruhe, Germany

fYear

2007

fDate

3-7 Sept. 2007

Firstpage

1294

Lastpage

1298

Abstract

Vocal activity detection is an important technology for both automatic speech recognition and automatic speech understanding. In meetings, participants typically vocalize for only a fraction of the recorded time, and standard vocal activity detection algorithms for close-talk microphones have shown to be ineffective. This is primarily due to the problem of crosstalk, in which a participant´s speech appears on other participants´ microphones, making it hard to attribute detected speech to its correct speaker. We describe an automatic multichannel segmentation system for meeting recognition, which accounts for both the observed acoustics and the inferred vocal activity states of all participants using joint multi-participant models. Our experiments show that this approach almost completely eliminates the crosstalk problem. Recent improvements to the baseline reduce the development set word error rate, achieved by a state-of-the-art multi-pass speech recognition system, by 62% relative to manual segmentation. We also observe significant performance improvements on unseen data.

Keywords

speaker recognition; automatic meeting recognition; automatic multichannel segmentation system; automatic speech recognition; automatic speech understanding; multispeaker segmentation; vocal activity detection; word error rate; Acoustics; Crosstalk; Manuals; Microphones; Silicon; Speech; Training;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal Processing Conference, 2007 15th European

Conference_Location

Poznan

Print_ISBN

978-839-2134-04-6

Type

conf

Filename

7099014