DocumentCode
3318242
Title
A new approach to feature selection for text categorization
Author
LI, Shoushan ; Zong, Chengqing
Author_Institution
Nat. Lab. of Pattern Recognition, Chinese Acad. of Sci., Beijing, China
fYear
2005
fDate
30 Oct.-1 Nov. 2005
Firstpage
626
Lastpage
630
Abstract
Text categorization (TC) is a problem of assigning a document into predefined classes. One of the most important issues in TC is feature selection. In this paper, we propose a new approach in feature selection called Strong Class Information Words (SCIW). Different from many existing feature selection methods, our method takes many kinds of information into account. Moreover, the method can easily use some implicit regularities of natural language. Our extensive experiments resulted in a good performance on precision by a linear classifier using SCIW feature selection method. The most attractive aspect of the classifier as a combining part in the categorization system is shown in our experiments and the combining system outperforms performances in comparison with conventional classifiers.
Keywords
classification; feature extraction; learning (artificial intelligence); text analysis; Strong Class Information Words; feature selection; linear classifier; natural language; text categorization; Frequency; Information analysis; Laboratories; Machine learning; Mutual information; Natural languages; Pattern recognition; Support vector machine classification; Support vector machines; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN
0-7803-9361-9
Type
conf
DOI
10.1109/NLPKE.2005.1598812
Filename
1598812
Link To Document