Title :
Sparse Inverse Covariance Matrices for Low Resource Speech Recognition
Author :
Weibin Zhang ; Fung, Pascale
Author_Institution :
Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China
Abstract :
We propose to use sparse inverse covariance matrices for acoustic model training when there is insufficient training data. Acoustic models trained with inadequate training data tend to over fit, generalizing poorly to unseen test data, especially when full covariance matrices are used. We address this problem by adding an L1 regularization term to the traditional objective function for maximum likelihood estimation, to penalize complex models. The structure of the inverse covariance matrices will be automatically sparsified using this new objective function. The Expectation Maximization algorithm is used to learn the parameters of the hidden Markov model using the new objective function. It is shown that the training procedures for all the hidden Markov model parameters are the same as that of maximum likelihood estimation except the inverse covariance matrices. The update equation for the inverse covariance matrices is concave and can be solved efficiently. Our experiments show that this proposed method can correctly learn the underlying correlations among the random variables of the speech feature vector. Experimental results on the Wall Street Journal data show that our proposed model significantly outperforms the diagonal covariance model and the full covariance model by 10.9% and 16.5% relative recognition accuracy, when only about 14 hours of training data are available. On our collected low resource language data-the Cantonese data set, the proposed model also significantly outperforms the diagonal covariance model and the full covariance model.
Keywords :
covariance matrices; expectation-maximisation algorithm; hidden Markov models; speech recognition; Cantonese data set; L1 regularization term; acoustic model training; diagonal covariance model; expectation maximization algorithm; full covariance model; hidden Markov model; low resource language data; low resource speech recognition; maximum likelihood estimation; objective function; random variables; sparse inverse covariance matrices; training data; update equation; Acoustics; Covariance matrix; Data models; Hidden Markov models; Speech recognition; Training; Training data; Low resource; sparse inverse covariance matrix; speech recognition;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2012.2221462