DocumentCode
537586
Title
Improved Feature Selection Algorithm Based on Concentration and Dispersion
Author
Shen You-Wen ; Zhao Xin-Jian
Author_Institution
Coll. of Comput. Sci. & Technol., Zhejiang Univ. of Technol., Hangzhou, China
Volume
1
fYear
2010
fDate
23-24 Oct. 2010
Firstpage
262
Lastpage
265
Abstract
This paper analyzes the concentration and dispersion of the integrated feature selection algorithm (TFFS),and finds their shortcomings: it is difficult for concentration to measure the weigh of the low frequent terms; dispersion ignores the impact of term whose mutual information is negative. Propose a modified feature selection algorithm (TFFSL), which makes certain improvements on concentration and dispersion, and takes the length of terms as a measure of weight factors. The SVM classification experimental results show that: compared with TFFS algorithm, TFFSL algorithm has higher accuracy and more capacity of eliminating irrelevant terms.
Keywords
feature extraction; information management; pattern classification; support vector machines; text analysis; SVM classification; TFFSL algorithm; feature selection algorithm; mutual information; feature selection; feature weight; mutual information; support vector machine; text classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Information Systems and Mining (WISM), 2010 International Conference on
Conference_Location
Sanya
Print_ISBN
978-1-4244-8438-6
Type
conf
DOI
10.1109/WISM.2010.28
Filename
5662323
Link To Document