Title :
N-Best Tokenization in a GMM-SVM Language Identification System
Author :
Xi Yang ; Manhung Siu
Author_Institution :
Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Kowloon, China
Abstract :
N-best or lattice-based tokenization has been widely used in speech-related classification tasks. In this paper, we extended the n-best tokenization approach to GMM-based language identification systems with either maximum likelihood (ML) trained or SVM-based language models. We explored the effect of n-best tokenization in training or testing, and its interaction with n-gram order and system fusion. We showed that for both systems, the n-best tokenization gives good performance improvement. However, the SVM-based system benefited from both n-best training and test while the ML-trained system can only benefit from n-best training. Results show n-best tokenization can reduce the relative EER of our best GMM-SVM system by about 5% for 30s and 10s tests.
Keywords :
Gaussian processes; maximum likelihood estimation; speech processing; speech recognition; support vector machines; GMM-SVM language identification; N-best tokenization; language models; lattice-based tokenization; maximum likelihood; n-best training; speech-related classification tasks; Councils; History; Lattices; Maximum likelihood estimation; Natural languages; Smoothing methods; Speech recognition; Support vector machines; System testing; Language Identification;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0727-3
DOI :
10.1109/ICASSP.2007.367242