مرکز منطقه ای اطلاع رساني علوم و فناوري - A Robust Method to Extract Talker Azimuth Orientation Using a Large-Aperture Microphone Array

DocumentCode :

1089116

Title :

A Robust Method to Extract Talker Azimuth Orientation Using a Large-Aperture Microphone Array

Author :

Levi, Avram ; Silverman, Harvey

Author_Institution :

Lab. for Eng. Man/Machine Syst. (LEMS), Brown Univ., Providence, RI, USA

Volume :

Issue :

fYear :

2010

Firstpage :

277

Lastpage :

285

Abstract :

Knowing the orientation of a talker in the focal area of a large-aperture microphone array enables the development of better beamforming algorithms (to obtain higher-quality speech output), improves source-location/tracking algorithms, and allows better selection and control of cameras in a video conference situation. Measurements in an anechoic room (e.g., Chu and Warnock, 2002) have quantified the average frequency-dependent magnitude (source radiation pattern) of the human speech source showing a front-to-back difference in magnitude that increases with frequency by about 8 dB/decade reaching about 18 dB at 8000 Hz. These amplitude differences, while severely masked by both coherent and noncoherent noise in a real environment, are the most extractable phenomena from a talker´s orientation when compared to other phenomena such as phase differences due to the source or effects due to diffraction at the mouth. In this paper, we propose a robust, source-radiation-pattern-based method for extraction of the azimuth angle of a single talker for whom an accurate point-source location estimate is known. The method requires no a priori training and has been tested in more than 100 situations with real human talkers having various locations and orientations in a room equipped with a large aperture microphone array. We compare these results against earlier published algorithms and find that the method proposed herein is the most robust and is sufficient to be considered for a real time system.

Keywords :

array signal processing; feature extraction; microphone arrays; speech processing; anechoic room; average frequency-dependent magnitude; beamforming algorithms; cameras; coherent noise; frequency 8000 Hz; human speech source; large-aperture microphone array; noncoherent noise; point-source location estimation; source location-tracking algorithms; source-radiation-pattern-based method; talker azimuth orientation extraction method; video conference; Beamforming; head orientation; head pose; microphone array; sensor network; speech acquisition;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2009.2025793

Filename :

5089426

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1089116