Approaches to Language Identification Using Gaussian Mixture Model and Linear Discriminant Analysis

Author

Zeng, Xiuhua ; Yang, Jian ; Xu, Dan

Author_Institution

Sch. of Inf. Sci. & Eng., Yunnan Univ., Kunming

fYear

2008

fDate

21-22 Dec. 2008

Firstpage

1109

Lastpage

1112

Abstract

The baseline system PRLM has the best performance on NIST language recognition evaluation tasks. But this system needs orthographically or phonetically transcribed utterances which can not be easily obtained from Chinese dialects and minority languages. So, the PRLM system is not used to these languages. To overcome this limitation, we present the Gaussian mixture model recognizer followed by language-dependent language model (GMM-LM) as an approach to language identification. In this paper, we focus on finding the optimum number of frames to train each GMM parameter and comparing two back-end processing approaches in GMM-LM system. The experiments show that the LDA processing approach can achieve average accuracy 78%, which is a 45% relative improvement over simple approach on 30s test data.

Keywords

Gaussian processes; natural language processing; Chinese dialects; Gaussian mixture model; NIST language recognition evaluation tasks; back-end processing approach; language identification; language-dependent language model; linear discriminant analysis; minority languages; Feature extraction; Information retrieval; Information science; Information security; Information technology; Linear discriminant analysis; NIST; National security; Natural languages; Testing; GMM-LM; LDA; language identification;

fLanguage

English

Publisher

ieee

Conference_Titel

Intelligent Information Technology Application Workshops, 2008. IITAW '08. International Symposium on

Conference_Location

Shanghai

Print_ISBN

978-0-7695-3505-0

Type

conf

DOI

10.1109/IITA.Workshops.2008.212

Filename

4732132