• DocumentCode
    2109086
  • Title

    Approaches to Language Identification Using Gaussian Mixture Model and Linear Discriminant Analysis

  • Author

    Zeng, Xiuhua ; Yang, Jian ; Xu, Dan

  • Author_Institution
    Sch. of Inf. Sci. & Eng., Yunnan Univ., Kunming
  • fYear
    2008
  • fDate
    21-22 Dec. 2008
  • Firstpage
    1109
  • Lastpage
    1112
  • Abstract
    The baseline system PRLM has the best performance on NIST language recognition evaluation tasks. But this system needs orthographically or phonetically transcribed utterances which can not be easily obtained from Chinese dialects and minority languages. So, the PRLM system is not used to these languages. To overcome this limitation, we present the Gaussian mixture model recognizer followed by language-dependent language model (GMM-LM) as an approach to language identification. In this paper, we focus on finding the optimum number of frames to train each GMM parameter and comparing two back-end processing approaches in GMM-LM system. The experiments show that the LDA processing approach can achieve average accuracy 78%, which is a 45% relative improvement over simple approach on 30s test data.
  • Keywords
    Gaussian processes; natural language processing; Chinese dialects; Gaussian mixture model; NIST language recognition evaluation tasks; back-end processing approach; language identification; language-dependent language model; linear discriminant analysis; minority languages; Feature extraction; Information retrieval; Information science; Information security; Information technology; Linear discriminant analysis; NIST; National security; Natural languages; Testing; GMM-LM; LDA; language identification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Information Technology Application Workshops, 2008. IITAW '08. International Symposium on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-0-7695-3505-0
  • Type

    conf

  • DOI
    10.1109/IITA.Workshops.2008.212
  • Filename
    4732132