Title :
An Improved Mutual Information-Based Feature Selection Algorithm for Text Classification
Author :
Jiang Xiao-Yu ; Jin Shui
Author_Institution :
Bus. Sch., Beijing Inst. of Fashion Technol., Beijing, China
Abstract :
Feature selection plays an important role in text classification, and contributes directly to the accuracy of the classification. In order to correct the defects, such as mutual information-Based feature selection method tends to select rare words and those words from small samples as features, and negative MI value. This paper proposes a new improved feature evaluation function for automatic text classification by taking word frequency, concentration rate between classes and dispersion within class into overall consideration. According to experimental results, the improved algorithm is well placed to remedy the defect that the original MI evaluation function is prone to select rare words, and can improve the performance of classification significantly.
Keywords :
classification; text analysis; concentration rate; feature evaluation function; feature selection; mutual information; text classification; word frequency; Classification algorithms; Computers; Dispersion; Frequency measurement; Mutual information; Text categorization; Training; feature selection; mutual information; text classification;
Conference_Titel :
Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2013 5th International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-0-7695-5011-4
DOI :
10.1109/IHMSC.2013.37