Title :
Performance Improvement of Audio-Visual Speech Recognition with Optimal Reliability Fusion
Author :
Tariquzzaman, Md ; Gyu, Song Min ; Young, Kim Jin ; You, Na Seung ; Rashid, M.A.
Author_Institution :
Sch. of Electron. & Comput. Eng., Chonnam Nat. Univ., Gwangju, South Korea
Abstract :
In state-of-the-art ASR technology, audio and video (AV) information based speech recognition is one of key challenges to cope with noise problem. AV fusion is one of the robust approaches for ASR. The main issues of AV fusion is where and how to integrate the two modalities´ information. To enhance the AV fusion performance the paper [1] has proposed the optimum reliability fusion (ORF) and applied the ORF to AV speaker identification. In this paper we adopt the ORF based fusion in AV based speech recognition and evaluate the performance improvement in that domain. The ORF´s main idea is to introduce weighting factors in score-base reliability measure (SCRM) for solving the over- or under-estimation problem in SCRM calculation. Our AV speech recognition system is implemented for Korean digit recognition using SAMSUMG AV database. Experimental results show that ORF effectively reduce the relative error rate of 42.8% in comparison with the baseline system adopt the previous AV fusion scheme [2].
Keywords :
audio-visual systems; speech recognition; ASR technology; Korean digit recognition; SAMSUMG AV database; audio-visual speech recognition; optimal reliability fusion; score-base reliability measure; speaker identification; Databases; Educational institutions; Optimization; Robustness; Speech recognition; Visualization; Optimum Reliability fusion; Particle Swarm Optimization; Speech Recognition;
Conference_Titel :
Internet Computing & Information Services (ICICIS), 2011 International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4577-1561-7
DOI :
10.1109/ICICIS.2011.58