Blind Speech Separation in a Meeting Situation with Maximum SNR Beamformers

Author

Araki, Shoko ; Sawada, Hiroshi ; Makino, Shoji

Author_Institution

NTT Commun. Sci. Lab., NTT Corp., Tokyo

Volume

1

fYear

2007

fDate

15-20 April 2007

Abstract

We propose a speech separation method for a meeting situation, where each speaker sometimes speaks and the number of speakers changes every moment. Many source separation methods have already been proposed, however, they consider a case where all the speakers keep speaking: this is not always true in a real meeting. In such cases, in addition to separation, speech detection and the classification of the detected speech according to speaker become important issues. For that purpose, we propose a method that employs a maximum signal-to-noise (MaxSNR) beamformer combined with a voice activity detector and online clustering. We also discuss the scaling ambiguity problem as regards the MaxSNR beamformer, and provide their solutions. We report some encouraging results for a real meeting in a room with a reverberation time of about 350 ms.

Keywords

blind source separation; speaker recognition; speech processing; blind speech separation; maximum SNR beamformers; maximum signal-to-noise; meeting situation; online clustering; source separation methods; speech classification; speech detection; voice activity detector; Fourier transforms; Frequency conversion; Frequency response; Information science; Interference; Proposals; Reverberation; Speech; Time frequency analysis; Wideband; Speech separation; maximum SNR beamformer; online clustering; scaling ambiguity; voice activity detector;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on

Conference_Location

Honolulu, HI

ISSN

1520-6149

Print_ISBN

1-4244-0727-3

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2007.366611

Filename

4217011