DocumentCode :
3163055
Title :
Easy does it: Robust spectro-temporal many-stream ASR without fine tuning streams
Author :
Ravuri, Suman V. ; Morgan, Nelson
Author_Institution :
Int. Comput. Sci. Inst., Berkeley, CA, USA
fYear :
2012
fDate :
25-30 March 2012
Firstpage :
4309
Lastpage :
4312
Abstract :
Previous work has shown that spectro-temporal features reduce the word error rate for automatic speech recognition under noisy conditions. These systems, however, required significant hand-tuning in order to determine which spectral and temporal modulations should be included in a particular stream. In this work, streams are split into one spectral and temporal modulation each and their posterior probabilities are combined once each stream is discriminatively trained via multilayer perceptron. We show that this combination structure performs as well or better than more elaborate methods in which multiple spectral and temporal modulations are hand-picked per stream. In addition, these type of features outperform standard noise-robust features such as the “Advanced Front End” features, whereas our hand-picked spectro-temporal features do not.
Keywords :
multilayer perceptrons; speech recognition; advanced front end features; automatic speech recognition; fine tuning streams; hand-picked spectro-temporal features; multilayer perceptron; noisy conditions; robust spectro-temporal many-stream ASR; Accuracy; Mel frequency cepstral coefficient; Modulation; Noise; Noise measurement; Spectrogram; Speech recognition; automatic speech recognition; spectrotemporal features;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2012.6288872
Filename :
6288872
Link To Document :
بازگشت