DocumentCode
3397486
Title
Modified estimation of between-class covariance matrix in linear discriminant analysis of speech
Author
Viszlay, Peter ; Juhar, Jozef ; Pleva, Matus
Author_Institution
Dept. of Electron. & Multimedia Commun., Tech. Univ. of Kosice, Kosice, Slovakia
fYear
2013
fDate
7-9 July 2013
Firstpage
167
Lastpage
170
Abstract
Linear discriminant analysis (LDA) is a popular supervised feature transformation applied in current automatic speech recognition (ASR). Generally, the parameters of LDA are computed from the training data partitioned into classes. If the number of classes is smaller than the dimension of the supervectors (typically in phoneme-based LDA) then the between-class covariance matrix can become singular or close to singular (singularity problem in classical LDA). In this paper, we present a modification of the standard between-class covariance matrix estimation, which represents one of the possible approaches to solving the singularity problem. Our method works directly with the supervectors instead of the class mean vectors. The number of estimation cycles is much larger because more data are used during the computation. Thus, the matrix structure can be significantly refined. This implies that larger lengths of context can be used while the singularity problem is efficiently eliminated. The effectiveness of the proposed estimation is evaluated in Slovak phoneme-based and triphone-based large vocabulary continuous speech recognition (LVCSR) task. The method is compared to the state-of-the-art MFCCs and to LDA trained in the standard way. The experimental results confirm that the modified LDA considerably outperforms the MFCCs and consistently leads to improvements of the conventional LDA.
Keywords
covariance matrices; speech recognition; ASR; LDA; LVCSR; MFCC; Slovak phoneme-based large vocabulary continuous speech recognition task; automatic speech recognition; class mean vectors; linear discriminant speech analysis; modified between-class covariance matrix estimation; singularity problem; supervectors; supervised feature transformation; triphone-based large vocabulary continuous speech recognition task; Bismuth; Context; Covariance matrices; Estimation; Hidden Markov models; Training; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Signals and Image Processing (IWSSIP), 2013 20th International Conference on
Conference_Location
Bucharest
ISSN
2157-8672
Print_ISBN
978-1-4799-0941-4
Type
conf
DOI
10.1109/IWSSIP.2013.6623480
Filename
6623480
Link To Document