DocumentCode :
2291670
Title :
An evaluation of sound source identification with RWCP sound scene database in real acoustic environments
Author :
Nishiura, Takanobu ; Nakamura, Satoshi
Author_Institution :
Fac. of Syst. Eng., Wakayama Univ., Japan
Volume :
2
fYear :
2002
fDate :
2002
Firstpage :
265
Abstract :
It is very important for a hands-free speech interface to capture distant speech with high quality. A microphone array is an ideal candidate for this purpose. However, this approach requires localizing the target talker. Conventional talker localization methods in multiple sound source environments not only have difficulty localizing the multiple sound sources accurately, but also have difficulty localizing the target talker among known multiple sound source positions. To cope with these problems, we propose a new talker localization method consisting of two algorithms. One algorithm is for multiple sound source localization based on CSP (cross-power spectrum phase) analysis. The other algorithm is for sound source identification among localized multiple sound sources towards talker localization. We particularly focus on the latter statistical sound source identification among localized multiple sound sources with statistical speech and environmental sound models based on GMMs (Gaussian mixture models) and a microphone array towards talker localization. We especially evaluate the performance of the proposed algorithms with the RWCP sound scene database in real acoustic environments (RWCP-DB).
Keywords :
acoustic arrays; acoustic signal processing; source separation; spectral analysis; speech processing; statistical analysis; Gaussian mixture models; RWCP sound scene database; cross-power spectrum phase analysis; distant speech; hands-free speech interface; microphone array; multiple sound source; real acoustic environments; sound source identification; target talker localization; Acoustical engineering; Automatic speech recognition; Data engineering; Databases; Layout; Microphone arrays; Natural languages; Phased arrays; Speech enhancement; Systems engineering and theory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia and Expo, 2002. ICME '02. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7803-7304-9
Type :
conf
DOI :
10.1109/ICME.2002.1035570
Filename :
1035570
Link To Document :
بازگشت