Title :
Design and Implementation of 3D Auditory Scene Visualizer towards Auditory Awareness with Face Tracking
Author :
Kubota, Yuji ; Yoshida, Masatoshi ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G.
Author_Institution :
Grad. Sch. of Inf., Kyoto Univ., Kyoto
Abstract :
If machine audition can recognize an auditory scene containing simultaneous and moving talkers, what kinds of awareness will people gain from an auditory scene visualizer? This paper presents the design and implementation of 3D Auditory Scene Visualizer based on the visual information seeking mantra, i.e., ldquooverview first, zoom and filter, then details on demandrdquo. The machine audition system called HARK captures 3D sounds with a microphone array, localizes and separates sounds, and recognizes separated sounds by automatic speech recognition (ASR). The 3D visualizer implemented in Java 3D displays each sound stream as a beam originating from the center of the microphones (overview mode), shows temporal snapshots with/without specifying focusing areas (zoom and filter mode), and shows detailed information about a particular sound stream (details on demand). In the details-ondemand mode, ASR results are displayed in a ldquokaraokerdquo manner, i.e., character-by-character. This three-mode visualization will give the user auditory awareness enhanced by HARK. In addition, a face-tracking system automatically changes the focus of attention by tracking the userpsilas face. The resulting system is portable and can be deployed in any place, so it is expected to give more vivid awareness than expensive high-fidelity auditory scene reproduction systems.
Keywords :
Java; audio signal processing; data visualisation; speech recognition; 3D auditory scene visualizer; HARK; Java 3D displays; auditory awareness; automatic speech recognition; face tracking; machine audition; microphone array; visual information seeking mantra; Audio recording; Automatic speech recognition; Image analysis; Information filtering; Information filters; Layout; Microphone arrays; Music; Video recording; Visualization; Auditory awareness; Auditory scene visualizer; Face tracking;
Conference_Titel :
Multimedia, 2008. ISM 2008. Tenth IEEE International Symposium on
Conference_Location :
Berkeley, CA
Print_ISBN :
978-0-7695-3454-1
Electronic_ISBN :
978-0-7695-3454-1
DOI :
10.1109/ISM.2008.107