Title :
Optimized Approach of Feature Selection Based on Information Gain
Author :
Guohua Wu;Junjun Xu
Author_Institution :
Sch. of Comput. Sci. &
Abstract :
Text feature selection is the key technology in text classification and text information retrieval. The feature selection method - information gain - has extensive application in text categorization. This paper theoretically analyzed the deficiency of information gain in feature selection methods, and then introduced two improvement factors which were LDFWF (Limiting Document Frequency´s Word Frequency) and DI (Distribution Information), on this basis an improved text feature selection method was proposed. In this paper, the experiments used the SVM classifier for text classification, text feature selection methods respectively used information gain and the improved information gain that this paper proposed, the results show that the method effectively improve the accuracy of text classification.
Keywords :
"Text categorization","Classification algorithms","Support vector machines","Limiting","Frequency measurement","Algorithm design and analysis","Computer science"
Conference_Titel :
Computer Science and Mechanical Automation (CSMA), 2015 International Conference on
DOI :
10.1109/CSMA.2015.38