Title :
A new approach to feature selection for text categorization
Author :
LI, Shoushan ; Zong, Chengqing
Author_Institution :
Nat. Lab. of Pattern Recognition, Chinese Acad. of Sci., Beijing, China
fDate :
30 Oct.-1 Nov. 2005
Abstract :
Text categorization (TC) is a problem of assigning a document into predefined classes. One of the most important issues in TC is feature selection. In this paper, we propose a new approach in feature selection called Strong Class Information Words (SCIW). Different from many existing feature selection methods, our method takes many kinds of information into account. Moreover, the method can easily use some implicit regularities of natural language. Our extensive experiments resulted in a good performance on precision by a linear classifier using SCIW feature selection method. The most attractive aspect of the classifier as a combining part in the categorization system is shown in our experiments and the combining system outperforms performances in comparison with conventional classifiers.
Keywords :
classification; feature extraction; learning (artificial intelligence); text analysis; Strong Class Information Words; feature selection; linear classifier; natural language; text categorization; Frequency; Information analysis; Laboratories; Machine learning; Mutual information; Natural languages; Pattern recognition; Support vector machine classification; Support vector machines; Text categorization;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN :
0-7803-9361-9
DOI :
10.1109/NLPKE.2005.1598812