Title :
Time-frequency masking for speaker of interest extraction in an immersive environment
Author :
Unnikrishnan, Harikrishnan ; Donohue, Kevin D. ; Hannemann, Jens
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Kentcuky, Lexington, KY, USA
Abstract :
Distributed microphone systems can be used to enhance intelligibility for a speaker of interest (SOI) in a noisy environment of multiple speech sources (cocktail party scenario). For finite microphone distributions, however, interfering speech sources leak into the beamformed signal and degrade intelligibility. This article introduces an auditory inspired post-processing algorithm for beamformed signals using spectro-temporal cues to enhance SOI intelligibility. Spatial power ratios obtained through beamforming on multiple locations are used to identify and mask out time-frequency regions dominated by the interfering speech. Performance results based on planar microphone array simulations show consistent increases in the Speech Intelligibility Index (SII) over the beamformed signal for various configurations of speakers using 2 to 16 microphones. In cases of critically low SII (<; 0.25), the application of interference masking achieves critical enhancements in SII, increasing it beyond .3 for the case of 2 microphones to above .5 for the 16 microphone case. Experimental recording were also performed and examples presented. The experimental recordings show similar improvements consistent with the simulation.
Keywords :
array signal processing; microphone arrays; speech intelligibility; speech processing; SII; SOI intelligibility; auditory inspired post-processing algorithm; beamformed signals; beamforming; distributed microphone systems; finite microphone distributions; interference masking; interfering speech sources; planar microphone array simulations; spatial power ratios; speaker of interest extraction; spectrotemporal cues; speech intelligibility index; time-frequency masking; Array signal processing; Indexes; Interference; Microphones; Reverberation; Speech; Time-frequency analysis;
Conference_Titel :
SOUTHEASTCON 2014, IEEE
Conference_Location :
Lexington, KY
DOI :
10.1109/SECON.2014.6950651