Title :
Improved Feature Selection Algorithm Based on Concentration and Dispersion
Author :
Shen You-Wen ; Zhao Xin-Jian
Author_Institution :
Coll. of Comput. Sci. & Technol., Zhejiang Univ. of Technol., Hangzhou, China
Abstract :
This paper analyzes the concentration and dispersion of the integrated feature selection algorithm (TFFS),and finds their shortcomings: it is difficult for concentration to measure the weigh of the low frequent terms; dispersion ignores the impact of term whose mutual information is negative. Propose a modified feature selection algorithm (TFFSL), which makes certain improvements on concentration and dispersion, and takes the length of terms as a measure of weight factors. The SVM classification experimental results show that: compared with TFFS algorithm, TFFSL algorithm has higher accuracy and more capacity of eliminating irrelevant terms.
Keywords :
feature extraction; information management; pattern classification; support vector machines; text analysis; SVM classification; TFFSL algorithm; feature selection algorithm; mutual information; feature selection; feature weight; mutual information; support vector machine; text classification;
Conference_Titel :
Web Information Systems and Mining (WISM), 2010 International Conference on
Conference_Location :
Sanya
Print_ISBN :
978-1-4244-8438-6
DOI :
10.1109/WISM.2010.28