DocumentCode :
103931
Title :
Robust tri-modal automatic speech recognition for consumer applications
Author :
Anderson, S.J. ; Fong, A.C.M. ; Jie Tang
Author_Institution :
Auckland Univ. of Technol., Auckland, New Zealand
Volume :
59
Issue :
2
fYear :
2013
fDate :
May-13
Firstpage :
352
Lastpage :
360
Abstract :
Commercial automatic speech recognition (ASR) started to appear in the late 1980¿s and can offer a more natural means of accepting user inputs than methods such as typing on keyboards or touch screens. This is a particularly important consideration for small consumer devices such as smartphones. In many practical situations, however, performance of ASR can be significantly compromised due to ambient noise and variable lighting conditions. Previous research has shown that adding visual cues to standard ASR can mitigate the effects of ambient noise. However, audiovisual (AV) ASR is not robust against variable lighting conditions, which are often encountered by users of consumer devices. Since thermal imaging is invariant to changing lighting conditions, the authors propose a trimodal thermal-audiovisual (TAV) ASR using adaptations of established techniques such as MT, DCT and MFCC. Experimental results demonstrate the robustness of this approach over a range of signal-to-noise ratios: tri-modal TAV recognition rates were +39.2% over audio-only ASR and +11.8% over AVASR recognition rates The authors believe that robust ASR will lead to improved user experiences.
Keywords :
speech recognition; ASR; TAV; ambient noise; consumer applications; consumer devices; keyboards; lighting conditions; robust trimodal automatic speech recognition; smartphones; thermal imaging; touch screens; trimodal thermal-audiovisual; variable lighting conditions; Band-pass filters; Lighting; Noise; Speech; Standards; Videos; Visualization; Speech recognition; audiovisual processing.; environment adaptation; speaker adaptation; voice control for consumer devices;
fLanguage :
English
Journal_Title :
Consumer Electronics, IEEE Transactions on
Publisher :
ieee
ISSN :
0098-3063
Type :
jour
DOI :
10.1109/TCE.2013.6531117
Filename :
6531117
Link To Document :
بازگشت