DocumentCode :
3161959
Title :
Modelling spectro-temporal dynamics in factorisation-based noise-robust automatic speech recognition
Author :
Hurmalainen, Antti ; Virtanen, Tuomas
Author_Institution :
Tampere Univ. of Technol., Tampere, Finland
fYear :
2012
fDate :
25-30 March 2012
Firstpage :
4113
Lastpage :
4116
Abstract :
Non-negative spectral factorisation has been used successfully for separation of speech and noise in automatic speech recognition, both in feature-enhancing front-ends and in direct classification. In this work, we propose employing spectro-temporal 2D filters to model dynamic properties of Mel-scale spectrogram patterns in addition to static magnitude features. The results are evaluated using an exemplar-based sparse classifier on the CHiME noisy speech database. After optimisation of static features and modelling of temporal dynamics with derivative features, we achieve 87.4% average score over SNRs from 9 to -6 dB, reducing the word error rate by 28.1% from our previous static-only features.
Keywords :
filters; noise; optimisation; speech recognition; CHiME noisy speech database; Mel-scale spectrogram; SNR; direct classification; exemplar-based sparse classifier; factorisation-based noise-robust automatic speech recognition; feature-enhancing front-ends; noise separation; optimisation; spectro-temporal 2D filters; spectro-temporal dynamic model; speech separation; static magnitude features; Feature extraction; Noise; Noise measurement; Spectrogram; Speech; Speech recognition; Vectors; Automatic speech recognition; exemplar-based; noise robustness; spectral factorisation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2012.6288823
Filename :
6288823
Link To Document :
بازگشت