DocumentCode :
2550017
Title :
Experimental evaluation of feature selection methods for text classification
Author :
Uchyigit, Gulden
Author_Institution :
Sch. of Comput., Eng. & Math., Univ. of Brighton, Lewes, UK
fYear :
2012
fDate :
29-31 May 2012
Firstpage :
1294
Lastpage :
1298
Abstract :
In this paper we present the experiments of a comparative study of feature selection methods used for text classification. Ten feature selection methods were evaluated in this study, including a new feature selection method, called the GU metric. The other feature selection methods evaluated in this study are: Chi-Squared (χ2) statistic, NGL coefficient, GSS coefficient, Mutual Information, Information Gain, Odds Ratio, Term Frequency, Fisher Criterion, BSS/WSS coefficient. The experimental evaluations show that the GU metric obtained the best F1 and F2 scores. The experiments were performed on the 20 Newsgroups data sets with the Naive Probabilistic Classifier.
Keywords :
pattern classification; statistical analysis; text analysis; /WSS coefficient method; Chi-Squared statistic method; Fisher criterion method; GSS coefficient method; GU metric; NGL coefficient method; feature selection methods; information gain method; mutual information method; naive probabilistic classifier; odds ratio method; term frequency method; text classification; Classification algorithms; Equations; Measurement; Mutual information; Probabilistic logic; Text categorization; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on
Conference_Location :
Sichuan
Print_ISBN :
978-1-4673-0025-4
Type :
conf
DOI :
10.1109/FSKD.2012.6234191
Filename :
6234191
Link To Document :
بازگشت