Title :
A new weighting algorithm for linear classifier
Author :
Chen, Keli ; Zong, Chengqing
Author_Institution :
Nat. Lab. of Pattern Recognition, Chinese Acad. of Sci., Beijing, China
Abstract :
In the domain of text categorization (TC), the TF (term frequency)* IDF (inverse document frequency) weighting algorithm and TF*IWF*IWF weighting algorithm are widely used. However, the two algorithms are too biased by the term frequency and neglect the imbalance between classes. In this paper, we propose a new weighting algorithm, which is named as TF (term frequency)*IWF (inverse word frequency)*IWF (inverse word frequency)*VE (variance and expectation). The new algorithm improves the TF*IWF*IWF weighting algorithm in both TF and VE. This paper compares the new algorithm with TF*IWF*IWF algorithm respectively in theory and experiment. From the preliminary experiment, we find that the F1-measure has been improved for 11.78%.
Keywords :
classification; computational linguistics; text analysis; inverse document frequency; inverse word frequency; term frequency; text categorization; text classifier; weighting algorithm; Automation; Equations; Frequency; Laboratories; Pattern recognition; Statistics; Testing;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
DOI :
10.1109/NLPKE.2003.1275987