DocumentCode
2371195
Title
A feature selection framework for text filtering
Author
Zheng, Zhaohui ; Srihari, Rohini ; Srihari, Sargur
Author_Institution
CEDAR, State Univ. of New York, Buffalo, NY, USA
fYear
2003
fDate
19-22 Nov. 2003
Firstpage
705
Lastpage
708
Abstract
We present a new framework for local feature selection in text filtering. In this framework, a feature set is constructed per category by first selecting a set of terms highly indicative of membership (positive set) and another set of terms highly indicative of nonmembership (negative set), and then combining these two sets. This feature selection framework not only unifies several standard feature selection methods, but also facilitates the proposal of a new method that optimally combines the positive and negative sets. The experimental comparison between the proposed method and standard methods was conducted on six feature selection metrics: chi-square, correlation coefficient, odds ratio, GSS coefficient and two proposed variants of odds ratio and GSS coefficient: OR-square and GSS-square respectively. The results show that the proposed feature selection method improves text filtering performance.
Keywords
correlation methods; feature extraction; statistical analysis; text analysis; GSS coefficient; chi-square metric; correlation coefficient; data mining; feature selection method; feature set; text filtering; Chromium; Computer science; Data mining; Feedback; Frequency measurement; Gain measurement; Information filtering; Information filters; Mutual information; Proposals;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN
0-7695-1978-4
Type
conf
DOI
10.1109/ICDM.2003.1251013
Filename
1251013
Link To Document