DocumentCode
1463733
Title
A Two Microphone-Based Approach for Source Localization of Multiple Speech Sources
Author
Zhang, Wenyi ; Rao, Bhaskar D.
Author_Institution
Dept. of Electr. & Comput. Eng., Univ. of California at San Diego, La Jolla, CA, USA
Volume
18
Issue
8
fYear
2010
Firstpage
1913
Lastpage
1928
Abstract
This paper proposes a two microphone-based source localization technique for multiple speech sources utilizing speech specific properties and novel clustering algorithms. Voiced speech is sparse in the frequency domain and can be represented by sinusoidal tracks via sinusoidal modeling which provides high local signal-to-noise ratio (SNR). By utilizing the inter-channel phase differences (IPDs) between the dual channels on the sinusoidal tracks, the source localization of the mixed multiple speech sources is turned into a clustering problem on the IPD versus frequency plot. The generalized mixture decomposition algorithm (GMDA) is used to cluster the groups of points corresponding to multiple sources and thus estimate the direction of arrival (DOA) of the sources. Experiments illustrate the proposed GMDA algorithm with the Laplacian noise model can estimate the number of sources accurately and exhibits smaller DOA estimation error than the baseline histogram based DOA estimation algorithm in various scenarios including reverberant and additive white noise environments. Experiments suggest that appropriate power thresholding can be a simple and good approximation to the sinusoidal modeling, for the purpose of selecting time-frequency points with high local SNR, with slight loss in performance.
Keywords
direction-of-arrival estimation; frequency-domain analysis; microphones; speech processing; DOA estimation error; GMDA; Laplacian noise model; additive white noise; clustering problem; direction-of-arrival estimation; frequency domain method; generalized mixture decomposition algorithm; interchannel phase differences; microphone-based source localization technique; multiple speech sources; signal-to-noise ratio; speech utilization; Additive white noise; Clustering algorithms; Direction of arrival estimation; Estimation error; Frequency domain analysis; Histograms; Laplace equations; Signal to noise ratio; Speech; Working environment noise; Clustering; direction of arrival (DOA) estimation; dual channel; generalized mixture decomposition algorithm (GMDA); sinusoidal modeling; source localization; sparsity; spatial aliasing; speech;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2010.2040525
Filename
5443663
Link To Document