DocumentCode
2480533
Title
A Study for Important Criteria of Feature Selection in Text Categorization
Author
Xu Yan
Author_Institution
Beijing Language & Culture Univ., Beijing, China
fYear
2010
fDate
22-23 May 2010
Firstpage
1
Lastpage
4
Abstract
A major difficulty of text categorization is the high dimensionality of the feature space. Feature selection is an important step in text categorization to reduce the feature space. Empirical studies of text categorization show that good text categorization performance is related to some feature selection criteria, and when a criterion is not satisfied, it often indicates non-optimality of the method. According to our analysis, there are some reasons for good performance of feature selection in text categorization tasks: favoring common terms, using category information and using term frequency information), and so on. Automatic feature selection methods such as document frequency thresholding (DF), information gain (IG), mutual information (MI), and so on are commonly applied in text categorization, but none of them satisfies all the criteria above. In this paper, we present some Important criteria of FS in TC. Experimental results indicate that the empirical performance of a FS function is tightly related to how well it satisfies these criteria.
Keywords
text analysis; category information; document frequency thresholding; feature selection criteria; feature space; frequency information; information gain; mutual information; text categorization; Availability; Document handling; Frequency measurement; Gain measurement; Information analysis; Mutual information; Organizing; Performance analysis; Space technology; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Systems and Applications (ISA), 2010 2nd International Workshop on
Conference_Location
Wuhan
Print_ISBN
978-1-4244-5872-1
Electronic_ISBN
978-1-4244-5874-5
Type
conf
DOI
10.1109/IWISA.2010.5473381
Filename
5473381
Link To Document