• DocumentCode
    1126056
  • Title

    Introducing a family of linear measures for feature selection in text categorization

  • Author

    Combarro, Elís F. ; Montañés, Elena ; Díaz, Irene ; Ranilla, José ; Mones, Ricardo

  • Author_Institution
    Artificial Intelligence Center, Oviedo Univ., Gijon, Spain
  • Volume
    17
  • Issue
    9
  • fYear
    2005
  • Firstpage
    1223
  • Lastpage
    1232
  • Abstract
    Text categorization, which consists of automatically assigning documents to a set of categories, usually involves the management of a huge number of features. Most of them are irrelevant and others introduce noise which could mislead the classifiers. Thus, feature reduction is often performed in order to increase the efficiency and effectiveness of the classification. In this paper, we propose to select relevant features by means of a family of linear filtering measures which are simpler than the usual measures applied for this purpose. We carry out experiments over two different corpora and find that the proposed measures perform better than the existing ones.
  • Keywords
    classification; feature extraction; information filtering; learning (artificial intelligence); pattern classification; text analysis; document classification; feature reduction; feature selection; linear filtering measures; machine learning; text categorization; Availability; Filtering; Frequency; Humans; Machine learning; Maximum likelihood detection; Nonlinear filters; Performance evaluation; Text categorization; Wrapping; Index Terms- Text categorization; feature selection; filtering measures; machine learning.;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2005.149
  • Filename
    1490529