DocumentCode :
3678554
Title :
An Improved Information Gain Feature Selection Algorithm for SVM Text Classifier
Author :
Jiamin Xu;Hong Jiang
Author_Institution :
Dept. of Comput. Center, East China Normal Univ., Shanghai, China
fYear :
2015
Firstpage :
273
Lastpage :
276
Abstract :
Feature selection algorithm has a great influence on the accuracy of text categorization. The traditional information gain (IG) feature selection algorithm usually selects the features that rarely appear in the specified categories, but frequently appear in other categories. To overcome this drawback, on the basis of in-depth analysis of the related algorithms, an improved IG feature selection method is proposed. At first, the features are selected by the categories of data set, and the features from different categories are merged by an optimized method. Then, the weight of IG is calculated by using the probability of the appearance of these characteristics. At last, between-class concentration distribution factor and within-class word frequency dispersion distribution factor are adopted. SVM classifier is used to verify the algorithm. It is proved that our improved method has better performance than the original IG and other two improved methods.
Keywords :
"Text categorization","Classification algorithms","Algorithm design and analysis","Accuracy","Dispersion","Training","Support vector machines"
Publisher :
ieee
Conference_Titel :
Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2015 International Conference on
Type :
conf
DOI :
10.1109/CyberC.2015.53
Filename :
7307826
Link To Document :
بازگشت