Title :
Lowresource speech recognition with automatically learned sparse inverse covariance matrices
Author :
Zhang, Weibin ; Fung, Pascale
Author_Institution :
Dept. of Electron. & Comput. Eng., Univ. of Sci. & Technol., Hong Kong, China
Abstract :
Full covariance acoustic models trained with limited training data generalize poorly to unseen test data due to a large number of free parameters. We propose to use sparse inverse covariance matrices to address this problem. Previous sparse inverse covariance methods never outperformed full covariance methods. We propose a method to automatically drive the structure of inverse covariance matrices to sparse during training. We use a new objective function by adding L1 regularization to the traditional objective function for maximum likelihood estimation. The graphic lasso method for the estimation of a sparse inverse covariance matrix is incorporated into the Expectation Maximization algorithm to learn parameters of HMM using the new objective function. Experimental results show that we only need about 25% of the parameters of the inverse covariance matrices to be nonzero in order to achieve the same performance of a full covariance system. Our proposed system using sparse inverse covariance Gaussians also significantly outperforms a system using full covariance Gaussians trained on limited data.
Keywords :
acoustic signal processing; covariance matrices; expectation-maximisation algorithm; hidden Markov models; learning (artificial intelligence); matrix inversion; sparse matrices; speech recognition; HMM parameter learning; L1 regularization; automatically learned sparse inverse covariance matrices; expectation maximization algorithm; full covariance acoustic models; graphic lasso method; low resource speech recognition; maximum likelihood estimation; objective function; sparse inverse covariance Gaussians; training data; Covariance matrix; Hidden Markov models; Linear programming; Maximum likelihood estimation; Sparse matrices; Training; Training data; expectation maximization; graphic lasso; sparse inverse covariance matrix; speech recognition;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2012.6288977