Time-frequency masking for speaker of interest extraction in an immersive environment

Author

Unnikrishnan, Harikrishnan ; Donohue, Kevin D. ; Hannemann, Jens

Author_Institution

Dept. of Electr. & Comput. Eng., Univ. of Kentcuky, Lexington, KY, USA

fYear

2014

fDate

13-16 March 2014

Firstpage

1

Lastpage

8

Abstract

Distributed microphone systems can be used to enhance intelligibility for a speaker of interest (SOI) in a noisy environment of multiple speech sources (cocktail party scenario). For finite microphone distributions, however, interfering speech sources leak into the beamformed signal and degrade intelligibility. This article introduces an auditory inspired post-processing algorithm for beamformed signals using spectro-temporal cues to enhance SOI intelligibility. Spatial power ratios obtained through beamforming on multiple locations are used to identify and mask out time-frequency regions dominated by the interfering speech. Performance results based on planar microphone array simulations show consistent increases in the Speech Intelligibility Index (SII) over the beamformed signal for various configurations of speakers using 2 to 16 microphones. In cases of critically low SII (<; 0.25), the application of interference masking achieves critical enhancements in SII, increasing it beyond .3 for the case of 2 microphones to above .5 for the 16 microphone case. Experimental recording were also performed and examples presented. The experimental recordings show similar improvements consistent with the simulation.

Keywords

array signal processing; microphone arrays; speech intelligibility; speech processing; SII; SOI intelligibility; auditory inspired post-processing algorithm; beamformed signals; beamforming; distributed microphone systems; finite microphone distributions; interference masking; interfering speech sources; planar microphone array simulations; spatial power ratios; speaker of interest extraction; spectrotemporal cues; speech intelligibility index; time-frequency masking; Array signal processing; Indexes; Interference; Microphones; Reverberation; Speech; Time-frequency analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

SOUTHEASTCON 2014, IEEE

Conference_Location

Lexington, KY

Type

conf

DOI

10.1109/SECON.2014.6950651

Filename

6950651