DocumentCode
2290083
Title
Design and Implementation of 3D Auditory Scene Visualizer towards Auditory Awareness with Face Tracking
Author
Kubota, Yuji ; Yoshida, Masatoshi ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G.
Author_Institution
Grad. Sch. of Inf., Kyoto Univ., Kyoto
fYear
2008
fDate
15-17 Dec. 2008
Firstpage
468
Lastpage
476
Abstract
If machine audition can recognize an auditory scene containing simultaneous and moving talkers, what kinds of awareness will people gain from an auditory scene visualizer? This paper presents the design and implementation of 3D Auditory Scene Visualizer based on the visual information seeking mantra, i.e., ldquooverview first, zoom and filter, then details on demandrdquo. The machine audition system called HARK captures 3D sounds with a microphone array, localizes and separates sounds, and recognizes separated sounds by automatic speech recognition (ASR). The 3D visualizer implemented in Java 3D displays each sound stream as a beam originating from the center of the microphones (overview mode), shows temporal snapshots with/without specifying focusing areas (zoom and filter mode), and shows detailed information about a particular sound stream (details on demand). In the details-ondemand mode, ASR results are displayed in a ldquokaraokerdquo manner, i.e., character-by-character. This three-mode visualization will give the user auditory awareness enhanced by HARK. In addition, a face-tracking system automatically changes the focus of attention by tracking the userpsilas face. The resulting system is portable and can be deployed in any place, so it is expected to give more vivid awareness than expensive high-fidelity auditory scene reproduction systems.
Keywords
Java; audio signal processing; data visualisation; speech recognition; 3D auditory scene visualizer; HARK; Java 3D displays; auditory awareness; automatic speech recognition; face tracking; machine audition; microphone array; visual information seeking mantra; Audio recording; Automatic speech recognition; Image analysis; Information filtering; Information filters; Layout; Microphone arrays; Music; Video recording; Visualization; Auditory awareness; Auditory scene visualizer; Face tracking;
fLanguage
English
Publisher
ieee
Conference_Titel
Multimedia, 2008. ISM 2008. Tenth IEEE International Symposium on
Conference_Location
Berkeley, CA
Print_ISBN
978-0-7695-3454-1
Electronic_ISBN
978-0-7695-3454-1
Type
conf
DOI
10.1109/ISM.2008.107
Filename
4741208
Link To Document