• DocumentCode
    2148230
  • Title

    Feature Selection Based on Mutual Information for Language Recognition

  • Author

    Deng, Yan ; Liu, Jia

  • Author_Institution
    Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
  • fYear
    2009
  • fDate
    17-19 Oct. 2009
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    The prevailing system for language recognition is the parallel phoneme recognition followed by vector space modeling (PPRVSM), which uses a vector space model to describe the cooccurrence information of phones. As the super-vectors are composed of phonetic N-Grams, so for high dimension vectors, there is a problem that the number of N-Grams grows exponentially as the order N increases, which will result in data sparseness. In this paper, we propose a feature selection algorithm to solve this problem, which uses the maximum relevance criteria based on mutual information to select the most discriminative N-Grams to identify languages. The effectiveness of the technique is demonstrated on the NIST 2005 language recognition 30-second task. And we achieve 4.81% in terms of equal-error-rate (EER).
  • Keywords
    natural language processing; speech recognition; data sparseness; feature selection algorithm; language recognition; maximum relevance criteria; parallel phoneme recognition; phonetic N-Grams; super vectors; vector space modeling; Hidden Markov models; Information science; Laboratories; Lattices; Mutual information; Natural languages; Probability; Space technology; Support vector machine classification; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Image and Signal Processing, 2009. CISP '09. 2nd International Congress on
  • Conference_Location
    Tianjin
  • Print_ISBN
    978-1-4244-4129-7
  • Electronic_ISBN
    978-1-4244-4131-0
  • Type

    conf

  • DOI
    10.1109/CISP.2009.5303829
  • Filename
    5303829