Feature Selection Based on Mutual Information for Language Recognition

Author

Deng, Yan ; Liu, Jia

Author_Institution

Dept. of Electron. Eng., Tsinghua Univ., Beijing, China

fYear

2009

fDate

17-19 Oct. 2009

Firstpage

1

Lastpage

4

Abstract

The prevailing system for language recognition is the parallel phoneme recognition followed by vector space modeling (PPRVSM), which uses a vector space model to describe the cooccurrence information of phones. As the super-vectors are composed of phonetic N-Grams, so for high dimension vectors, there is a problem that the number of N-Grams grows exponentially as the order N increases, which will result in data sparseness. In this paper, we propose a feature selection algorithm to solve this problem, which uses the maximum relevance criteria based on mutual information to select the most discriminative N-Grams to identify languages. The effectiveness of the technique is demonstrated on the NIST 2005 language recognition 30-second task. And we achieve 4.81% in terms of equal-error-rate (EER).

Keywords

natural language processing; speech recognition; data sparseness; feature selection algorithm; language recognition; maximum relevance criteria; parallel phoneme recognition; phonetic N-Grams; super vectors; vector space modeling; Hidden Markov models; Information science; Laboratories; Lattices; Mutual information; Natural languages; Probability; Space technology; Support vector machine classification; Support vector machines;

fLanguage

English

Publisher

ieee

Conference_Titel

Image and Signal Processing, 2009. CISP '09. 2nd International Congress on

Conference_Location

Tianjin

Print_ISBN

978-1-4244-4129-7

Electronic_ISBN

978-1-4244-4131-0

Type

conf

DOI

10.1109/CISP.2009.5303829

Filename

5303829