• DocumentCode
    3318242
  • Title

    A new approach to feature selection for text categorization

  • Author

    LI, Shoushan ; Zong, Chengqing

  • Author_Institution
    Nat. Lab. of Pattern Recognition, Chinese Acad. of Sci., Beijing, China
  • fYear
    2005
  • fDate
    30 Oct.-1 Nov. 2005
  • Firstpage
    626
  • Lastpage
    630
  • Abstract
    Text categorization (TC) is a problem of assigning a document into predefined classes. One of the most important issues in TC is feature selection. In this paper, we propose a new approach in feature selection called Strong Class Information Words (SCIW). Different from many existing feature selection methods, our method takes many kinds of information into account. Moreover, the method can easily use some implicit regularities of natural language. Our extensive experiments resulted in a good performance on precision by a linear classifier using SCIW feature selection method. The most attractive aspect of the classifier as a combining part in the categorization system is shown in our experiments and the combining system outperforms performances in comparison with conventional classifiers.
  • Keywords
    classification; feature extraction; learning (artificial intelligence); text analysis; Strong Class Information Words; feature selection; linear classifier; natural language; text categorization; Frequency; Information analysis; Laboratories; Machine learning; Mutual information; Natural languages; Pattern recognition; Support vector machine classification; Support vector machines; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
  • Print_ISBN
    0-7803-9361-9
  • Type

    conf

  • DOI
    10.1109/NLPKE.2005.1598812
  • Filename
    1598812