Title :
Experimental evaluation of feature selection methods for text classification
Author :
Uchyigit, Gulden
Author_Institution :
Sch. of Comput., Eng. & Math., Univ. of Brighton, Lewes, UK
Abstract :
In this paper we present the experiments of a comparative study of feature selection methods used for text classification. Ten feature selection methods were evaluated in this study, including a new feature selection method, called the GU metric. The other feature selection methods evaluated in this study are: Chi-Squared (χ2) statistic, NGL coefficient, GSS coefficient, Mutual Information, Information Gain, Odds Ratio, Term Frequency, Fisher Criterion, BSS/WSS coefficient. The experimental evaluations show that the GU metric obtained the best F1 and F2 scores. The experiments were performed on the 20 Newsgroups data sets with the Naive Probabilistic Classifier.
Keywords :
pattern classification; statistical analysis; text analysis; /WSS coefficient method; Chi-Squared statistic method; Fisher criterion method; GSS coefficient method; GU metric; NGL coefficient method; feature selection methods; information gain method; mutual information method; naive probabilistic classifier; odds ratio method; term frequency method; text classification; Classification algorithms; Equations; Measurement; Mutual information; Probabilistic logic; Text categorization; Training;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on
Conference_Location :
Sichuan
Print_ISBN :
978-1-4673-0025-4
DOI :
10.1109/FSKD.2012.6234191