DocumentCode :
2704685
Title :
N-Best Tokenization in a GMM-SVM Language Identification System
Author :
Xi Yang ; Manhung Siu
Author_Institution :
Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Kowloon, China
Volume :
4
fYear :
2007
fDate :
15-20 April 2007
Abstract :
N-best or lattice-based tokenization has been widely used in speech-related classification tasks. In this paper, we extended the n-best tokenization approach to GMM-based language identification systems with either maximum likelihood (ML) trained or SVM-based language models. We explored the effect of n-best tokenization in training or testing, and its interaction with n-gram order and system fusion. We showed that for both systems, the n-best tokenization gives good performance improvement. However, the SVM-based system benefited from both n-best training and test while the ML-trained system can only benefit from n-best training. Results show n-best tokenization can reduce the relative EER of our best GMM-SVM system by about 5% for 30s and 10s tests.
Keywords :
Gaussian processes; maximum likelihood estimation; speech processing; speech recognition; support vector machines; GMM-SVM language identification; N-best tokenization; language models; lattice-based tokenization; maximum likelihood; n-best training; speech-related classification tasks; Councils; History; Lattices; Maximum likelihood estimation; Natural languages; Smoothing methods; Speech recognition; Support vector machines; System testing; Language Identification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
ISSN :
1520-6149
Print_ISBN :
1-4244-0727-3
Type :
conf
DOI :
10.1109/ICASSP.2007.367242
Filename :
4218273
Link To Document :
بازگشت