High resolution signal reconstruction

Author

Kristjansson, T. ; Hershey, John

fYear

2003

fDate

30 Nov.-3 Dec. 2003

Firstpage

291

Lastpage

296

Abstract

We present a framework for speech enhancement and robust speech recognition that exploits the harmonic structure of speech. We achieve substantial gains in signal-to-noise ratio (SNR) of enhanced speech as well as considerable gains in accuracy of automatic speech recognition in very noisy conditions. The method exploits the harmonic structure of speech by employing a high frequency resolution speech model in the log-spectrum domain and reconstructs the signal from the estimated posteriors of the clean signal and the phases from the original noisy signal. We achieve a gain in SNR of 8.38 dB for enhancement of speech at 0 dB. We also present recognition results on the Aurora 2 data-set. At 0 dB SNR, we achieve a reduction of relative word error rate of 43.75% over the baseline, and 15.90% over the equivalent low-resolution algorithm.

Keywords

error statistics; parameter estimation; signal reconstruction; signal resolution; speech enhancement; speech recognition; SNR; automatic speech recognition; clean signal posterior estimation; high resolution signal reconstruction; log-spectrum domain; signal-to-noise ratio; speech enhancement; word error rate; Automatic speech recognition; Frequency estimation; Phase estimation; Phase noise; Robustness; Signal reconstruction; Signal resolution; Signal to noise ratio; Speech enhancement; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on

Print_ISBN

0-7803-7980-2

Type

conf

DOI

10.1109/ASRU.2003.1318456

Filename

1318456