Title :
Data collection for mobile audio-visual speech recognition in various environments
Author :
Tamura, Satoshi ; Seko, Takumi ; Hayamizu, Satoru
Author_Institution :
Dept. Electr., Electron. & Comput. Eng., Gifu Univ., Gifu, Japan
Abstract :
This paper introduces our recent activities for audio-visual speech recognition on mobile devices and data collection in various environments. Audio-visual automatic speech recognition is effective in noisy or real conditions to enhance the robustness of speech recognizer and to improve the recognition accuracy. We have developed an audio-visual speech recognition interface for mobile devices. In order to evaluate the recognizer and investigate issues related to audio-visual processing on mobile computers, we collected speech data and lip images of 16 subjects in eight conditions, where there were various audio noises and visual difficulties. Audio-only speech recognition and visual-only lipreading were then conducted. Through these experiments, we found some issues and future works not only for construction of audio-visual database but also for robust audio-visual speech recognition.
Keywords :
audio-visual systems; mobile computing; speech recognition; audio noises; audio-only speech recognition; audio-visual automatic speech recognition; audio-visual database; audio-visual processing; audio-visual speech recognition interface; data collection; mobile audio-visual speech recognition; mobile computers; mobile devices; visual difficulties; visual-only lipreading; Hidden Markov models; Noise; Quantization (signal); Robustness; Visualization;
Conference_Titel :
Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014 17th Oriental Chapter of the International Committee for the
DOI :
10.1109/ICSDA.2014.7051434