• DocumentCode
    2371195
  • Title

    A feature selection framework for text filtering

  • Author

    Zheng, Zhaohui ; Srihari, Rohini ; Srihari, Sargur

  • Author_Institution
    CEDAR, State Univ. of New York, Buffalo, NY, USA
  • fYear
    2003
  • fDate
    19-22 Nov. 2003
  • Firstpage
    705
  • Lastpage
    708
  • Abstract
    We present a new framework for local feature selection in text filtering. In this framework, a feature set is constructed per category by first selecting a set of terms highly indicative of membership (positive set) and another set of terms highly indicative of nonmembership (negative set), and then combining these two sets. This feature selection framework not only unifies several standard feature selection methods, but also facilitates the proposal of a new method that optimally combines the positive and negative sets. The experimental comparison between the proposed method and standard methods was conducted on six feature selection metrics: chi-square, correlation coefficient, odds ratio, GSS coefficient and two proposed variants of odds ratio and GSS coefficient: OR-square and GSS-square respectively. The results show that the proposed feature selection method improves text filtering performance.
  • Keywords
    correlation methods; feature extraction; statistical analysis; text analysis; GSS coefficient; chi-square metric; correlation coefficient; data mining; feature selection method; feature set; text filtering; Chromium; Computer science; Data mining; Feedback; Frequency measurement; Gain measurement; Information filtering; Information filters; Mutual information; Proposals;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
  • Print_ISBN
    0-7695-1978-4
  • Type

    conf

  • DOI
    10.1109/ICDM.2003.1251013
  • Filename
    1251013