مرکز منطقه ای اطلاع رساني علوم و فناوري - Static and Dynamic Spectral Features: Their Noise Robustness and Optimal Weights for ASR

DocumentCode :

1118356

Title :

Static and Dynamic Spectral Features: Their Noise Robustness and Optimal Weights for ASR

Author :

Yang, Chen ; Soong, Frank K. ; Lee, Tan

Author_Institution :

Dept. of Electron. Eng., Chinese Univ. of Hong Kong, Shatin

Volume :

Issue :

fYear :

2007

fDate :

3/1/2007 12:00:00 AM

Firstpage :

1087

Lastpage :

1097

Abstract :

In this paper, we investigate the relative noise robustness of dynamic and static spectral features in speech recognition. It is found that the dynamic cepstrum is more robust to additive noise than its static counterpart. The results are consistent across different types of noise and over a wide range of noise levels. To exploit this unequal robustness, we propose a simple yet effective strategy of exponentially weighting the likelihoods that are contributed by the static and dynamic features during the decoding process. The optimal weights are discriminatively trained with a small amount of development data. This method is evaluated on two speaker-independent, connected digit databases, one in English (Aurora 2) and the other in Cantonese (CUDIGIT). For various types of noise at different signal-to-noise ratios (SNRs), the average relative word error rate reductions attained with the discriminatively trained weights are 36.6% and 41.9% for Aurora 2 and CUDIGIT, respectively. Noticeable performance improvement can be observed even when there is channel distortion. The proposed approach is appealing to practical applications because 1) noise estimation is not required, 2) model adaptation is not required, 3)only a minor modification of the decoding process is needed, and 4) only a few feature weights need to be trained

Keywords :

cepstral analysis; error statistics; speech recognition; ASR; Aurora 2; CUDIGIT; additive noise; automatic speech recognition; average relative word error rate reductions; channel distortion; dynamic cepstrum; dynamic spectral feature; optimal weights; relative noise robustness; signal-to-noise ratios; speaker-independent connected digit databases; static spectral feature; Additive noise; Automatic speech recognition; Cepstrum; Decoding; Noise level; Noise reduction; Noise robustness; Signal to noise ratio; Spatial databases; Speech recognition; Discriminative training; dynamic features; exponential weighting; noise robustness;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2006.885932

Filename :

4100703

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1118356