Title :
Binaural Detection, Localization, and Segregation in Reverberant Environments Based on Joint Pitch and Azimuth Cues
Author :
Woodruff, Jonathan ; DeLiang Wang
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fDate :
4/1/2013 12:00:00 AM
Abstract :
We propose an approach to binaural detection, localization and segregation of speech based on pitch and azimuth cues. We formulate the problem as a search through a multisource state space across time, where each multisource state encodes the number of active sources, and the azimuth and pitch of each active source. A set of multilayer perceptrons are trained to assign time-frequency units to one of the active sources in each multisource state based jointly on observed pitch and azimuth cues. We develop a novel hidden Markov model framework to estimate the most probable path through the multisource state space. An estimated state path encodes a solution to the detection, localization, pitch estimation and simultaneous organization problems. Segregation is then achieved with an azimuth-based sequential organization stage. We demonstrate that the proposed framework improves segregation relative to several two-microphone comparison systems that are based solely on azimuth cues. Performance gains are consistent across a variety of reverberant conditions.
Keywords :
Markov processes; multilayer perceptrons; reverberation; speech processing; active sources; azimuth cues; azimuth-based sequential organization stage; binaural detection; estimated state path encoding; hidden Markov model framework; multilayer perceptrons; multisource state-space; pitch cues; pitch estimation; reverberant environment localization; reverberant environment segregation; simultaneous organization problems; time-frequency units; two-microphone comparison systems; Acoustics; Azimuth; Estimation; Hidden Markov models; Joints; Organizations; Speech; Binaural speech segregation; computational auditory scene analysis; multipitch tracking; sound localization; source detection;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2012.2236316