DocumentCode
2972054
Title
Improving joint uncertainty decoding performance by predictive methods for noise robust speech recognition
Author
Xu, Haitian ; Gales, Mark J F ; Chin, K.K.
Author_Institution
Cambridge Res. Lab., Toshiba Res. Eur. Ltd., Cambridge, UK
fYear
2009
fDate
Nov. 13 2009-Dec. 17 2009
Firstpage
222
Lastpage
227
Abstract
Model-based noise compensation techniques, such as vector Taylor series (VTS) compensation, have been applied to a range of noise robustness tasks. However one of the issues with these forms of approach is that for large speech recognition systems they are computationally expensive. To address this problem schemes such as Joint Uncertainty Decoding (JUD) have been proposed. Though computationally more efficient, the performance of the system is typically degraded. This paper proposes an alternative scheme, related to JUD, but making fewer approximations, VTS-JUD. Unfortunately this approach also removes some of the computational advantages of JUD. To address this, rather than using VTS-JUD directly, it is used instead to obtain statistics to estimate a predictive linear transform, PCMLLR. This is both computationally efficient and limits some of the issues associated with the diagonal covariance matrices typically used with schemes such as VTS. PCMLLR can also be simply used within an adaptive training framework (PAT). The performance of the VTS-JUD, PCMLLR and PAT system were compared to a number of standard approaches on an in-car speech recognition task. The proposed scheme is an attractive alternative to existing approaches.
Keywords
covariance matrices; decoding; speech recognition; adaptive training framework; diagonal covariance matrices; in-car speech recognition task; joint uncertainty decoding performance; model-based noise compensation techniques; noise robust speech recognition; predictive linear transform; vector Taylor series compensation; Acoustic noise; Computational efficiency; Decoding; Degradation; Hidden Markov models; Noise robustness; Speech recognition; Taylor series; Uncertainty; Working environment noise;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
Conference_Location
Merano
Print_ISBN
978-1-4244-5478-5
Electronic_ISBN
978-1-4244-5479-2
Type
conf
DOI
10.1109/ASRU.2009.5373317
Filename
5373317
Link To Document