مرکز منطقه ای اطلاع رساني علوم و فناوري - N-Best Tokenization in a GMM-SVM Language Identification System

DocumentCode :

2704685

Title :

N-Best Tokenization in a GMM-SVM Language Identification System

Author :

Xi Yang ; Manhung Siu

Author_Institution :

Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Kowloon, China

Volume :

fYear :

2007

fDate :

15-20 April 2007

Abstract :

N-best or lattice-based tokenization has been widely used in speech-related classification tasks. In this paper, we extended the n-best tokenization approach to GMM-based language identification systems with either maximum likelihood (ML) trained or SVM-based language models. We explored the effect of n-best tokenization in training or testing, and its interaction with n-gram order and system fusion. We showed that for both systems, the n-best tokenization gives good performance improvement. However, the SVM-based system benefited from both n-best training and test while the ML-trained system can only benefit from n-best training. Results show n-best tokenization can reduce the relative EER of our best GMM-SVM system by about 5% for 30s and 10s tests.

Keywords :

Gaussian processes; maximum likelihood estimation; speech processing; speech recognition; support vector machines; GMM-SVM language identification; N-best tokenization; language models; lattice-based tokenization; maximum likelihood; n-best training; speech-related classification tasks; Councils; History; Lattices; Maximum likelihood estimation; Natural languages; Smoothing methods; Speech recognition; Support vector machines; System testing; Language Identification;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on

Conference_Location :

Honolulu, HI

ISSN :

1520-6149

Print_ISBN :

1-4244-0727-3

Type :

conf

DOI :

10.1109/ICASSP.2007.367242

Filename :

4218273

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2704685