Title :
A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition
Author :
Xiao, Xiong ; Li, Jinyu ; Chng, Eng Siong ; Li, Haizhou ; Lee, Chin-Hui
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
Abstract :
In this paper, we explore the generalization capability of acoustic model for improving speech recognition robustness against noise distortions. While generalization in statistical learning theory originally refers to the model´s ability to generalize well on unseen testing data drawn from the same distribution as that of the training data, we show that good generalization capability is also desirable for mismatched cases. One way to obtain such general models is to use margin-based model training method, e.g., soft-margin estimation (SME), to enable some tolerance to acoustic mismatches without a detailed knowledge about the distortion mechanisms through enhancing margins between competing models. Experimental results on the Aurora-2 and Aurora-3 connected digit string recognition tasks demonstrate that, by improving the model´s generalization capability through SME training, speech recognition performance can be significantly improved in both matched and low to medium mismatched testing cases with no language model constraints. Recognition results show that SME indeed performs better with than without mean and variance normalization, and therefore provides a complimentary benefit to conventional feature normalization techniques such that they can be combined to further improve the system performance. Although this study is focused on noisy speech recognition, we believe the proposed margin-based learning framework can be extended to dealing with different types of distortions and robustness issues in other machine learning applications.
Keywords :
learning (artificial intelligence); speech recognition; statistical analysis; Aurora-2 connected digit string recognition tasks; Aurora-3 connected digit string recognition tasks; acoustic mismatches; acoustic models; generalization capability; machine learning; margin-based learning framework; margin-based model training method; noise distortions; noisy speech recognition; normalization techniques; robust speech recognition; soft-margin estimation; statistical learning theory; unseen testing data; variance normalization; Aurora task; discriminative training; large margin; robust speech recognition;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2009.2031236