DocumentCode
2113535
Title
Robustifying cepstral features by mitigating the outlier effect for noisy speech recognition
Author
Hao-teng Fan ; Kuan-wei Hsieh ; Chien-Hao Huang ; Jeih-weih Hung
Author_Institution
Dept. of Electr. Eng., Nat. Chi Nan Univ., Puli, Taiwan
fYear
2013
fDate
23-25 July 2013
Firstpage
935
Lastpage
939
Abstract
The performance of automatic speech recognition (ASR) systems is often seriously degraded by noise interference. Among the techniques to reduce the noise effect, cepstral mean-and-variance normalization (CMVN) is a simple yet quite effective approach for processing MFCC speech features. However, the features processed by CMVN contain a significant number of outliers, which very likely weakens the effect of CMVN. This paper primarily proposes to deal with the outliers left by CMVN with two directions. The first one is to apply a sigmoid function transformation, which provides explicit lower and upper bounds for the outliers, and the second one exploits the well-known median filter to remove the impulse-like outliers in the CMVN features. Under the Aurora-2 digit recognition database and task, the presented two frameworks give rise to around 5% in absolute accuracy improvement in comparison with CMVN, and the corresponding word error rate reduction relative to the MFCC baseline is as high as 50%.
Keywords
audio databases; cepstral analysis; feature extraction; median filters; speech recognition; ASR systems; Aurora-2 digit recognition database; CMVN; MFCC speech feature processing; automatic speech recognition system; cepstral features robustification; cepstral mean-and-variance normalization; impulse-like outliers; median filter; noise effect; noise interference; noisy speech recognition; outlier effect mitigation; sigmoid function transformation; word error rate reduction; Frequency modulation; Fuzzy systems; Knowledge discovery; Mel frequency cepstral coefficient; Robustness; Signal to noise ratio; cepstral mean and variance normalization; median filter; noise robustness; sigmoid function; speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems and Knowledge Discovery (FSKD), 2013 10th International Conference on
Conference_Location
Shenyang
Type
conf
DOI
10.1109/FSKD.2013.6816329
Filename
6816329
Link To Document