Data collection for mobile audio-visual speech recognition in various environments

Author

Tamura, Satoshi ; Seko, Takumi ; Hayamizu, Satoru

Author_Institution

Dept. Electr., Electron. & Comput. Eng., Gifu Univ., Gifu, Japan

fYear

2014

Firstpage

1

Lastpage

6

Abstract

This paper introduces our recent activities for audio-visual speech recognition on mobile devices and data collection in various environments. Audio-visual automatic speech recognition is effective in noisy or real conditions to enhance the robustness of speech recognizer and to improve the recognition accuracy. We have developed an audio-visual speech recognition interface for mobile devices. In order to evaluate the recognizer and investigate issues related to audio-visual processing on mobile computers, we collected speech data and lip images of 16 subjects in eight conditions, where there were various audio noises and visual difficulties. Audio-only speech recognition and visual-only lipreading were then conducted. Through these experiments, we found some issues and future works not only for construction of audio-visual database but also for robust audio-visual speech recognition.

Keywords

audio-visual systems; mobile computing; speech recognition; audio noises; audio-only speech recognition; audio-visual automatic speech recognition; audio-visual database; audio-visual processing; audio-visual speech recognition interface; data collection; mobile audio-visual speech recognition; mobile computers; mobile devices; visual difficulties; visual-only lipreading; Hidden Markov models; Noise; Quantization (signal); Robustness; Visualization;

fLanguage

English

Publisher

ieee

Conference_Titel

Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014 17th Oriental Chapter of the International Committee for the

Type

conf

DOI

10.1109/ICSDA.2014.7051434

Filename

7051434