DocumentCode :
809414
Title :
SNR-Adaptive Stream Weighting for Audio-MES ASR
Author :
Lee, Ki-Seung
Author_Institution :
Dept. of Electron. Eng., Konkuk Univ., Seoul
Volume :
55
Issue :
8
fYear :
2008
Firstpage :
2001
Lastpage :
2010
Abstract :
Myoelectric signals (MESs) from the speaker´s mouth region have been successfully shown to improve the noise robustness of automatic speech recognizers (ASRs), thus promising to extend their usability in implementing noise-robust ASR. In the recognition system presented herein, extracted audio and facial MES features were integrated by a decision fusion method, where the likelihood score of the audio-MES observation vector was given by a linear combination of class-conditional observation log-likelihoods of two classifiers, using appropriate weights. We developed a weighting process adaptive to SNRs. The main objective of the paper involves determining the optimal SNR classification boundaries and constructing a set of optimum stream weights for each SNR class. These two parameters were determined by a method based on a maximum mutual information criterion. Acoustic and facial MES data were collected from five subjects, using a 60-word vocabulary. Four types of acoustic noise including babble, car, aircraft, and white noise were acoustically added to clean speech signals with SNR ranging from -14 to 31 dB. The classification accuracy of the audio ASR was as low as 25.5%. Whereas, the classification accuracy of the MES ASR was 85.2%. The classification accuracy could be further improved by employing the proposed audio-MES weighting method, which was as high as 89.4% in the case of babble noise. A similar result was also found for the other types of noise.
Keywords :
electromyography; medical signal processing; speech recognition; SNR-adaptive stream weighting; acoustic noise; audio MES feature; audio-MES ASR; automatic speech recognizers; babble noise; decision fusion method; facial MES feature; myoelectric signals; Automatic speech recognition; Face recognition; Mouth; Mutual information; Noise robustness; Signal to noise ratio; Speech enhancement; Streaming media; Usability; Vectors; Automatic speech recognition; decision fusion; maximum mutualinformation (MMI) criterion; myoelectric signals (MESs); optimalweighting; Algorithms; Artificial Intelligence; Electromyography; Humans; Pattern Recognition, Automated; Speech Production Measurement; Speech Recognition Software;
fLanguage :
English
Journal_Title :
Biomedical Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9294
Type :
jour
DOI :
10.1109/TBME.2008.921094
Filename :
4567620
Link To Document :
بازگشت