DocumentCode :
3001839
Title :
An Improved Algorithm to Term Weighting in Text Classification
Author :
Li, Ran ; Guo, Xianjiu
Author_Institution :
Inf. Eng. Coll., Dalian Ocean Univ., Dalian, China
fYear :
2010
fDate :
29-31 Oct. 2010
Firstpage :
1
Lastpage :
3
Abstract :
The traditional TF-IDF algorithm is a common method that is used to measure feature weight in text categorization. However, the algorithm doesn´t take the distribution of feature terms in inter-class and intra-class into consideration. Consequently, the algorithm can´t effectively weigh the distribution proportion of feature items. In order to solve this problem, information entropy in inter-class and intra-class which describes the distribution of feature terms was used to revise TF-IDF weight. Compared with traditional TF-IDF algorithm, the results of simulation experiment have demonstrated that the improved TF-DDF algorithm can get better classification results.
Keywords :
classification; entropy; text analysis; TF-IDF algorithm; distribution proportion; feature weight; information entropy; term weighting; text categorization; text classification; Accuracy; Biological system modeling; Classification algorithms; Entropy; Information entropy; Manganese; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia Technology (ICMT), 2010 International Conference on
Conference_Location :
Ningbo
Print_ISBN :
978-1-4244-7871-2
Type :
conf
DOI :
10.1109/ICMULT.2010.5630962
Filename :
5630962
Link To Document :
بازگشت