Title :
Improvement and Application of TF•IDF Method Based on Text Classification
Author :
Kuang, Qiaoyan ; Xu, Xiaoming
Author_Institution :
Comput. Dept., Hunan Int. Econ. Univ., Changsha, China
Abstract :
Feature extraction is the important prerequisite of classifying text effectively and automatically. TF·IDF is widely used to express the text feature weight. But it has some problems. TF·IDF can´t reflect the distribution of terms in the text, and then can´t reflect the importance degree and the difference between categories. This paper proposes a new feature weighting method-TF·IDF·Ci to which a new weight Ci is added to express the differences between classes on the base of original TF·IDF. After combining TF·IDF·Ci and specific classification algorithm, it always get a larger macro F1 value than of TF·IDF. Meanwhile, the standard deviation of the classification index of the TF·IDF·Ci is much smaller than that of TF·IDF. That shows TF·IDF·Ci not only improve the classification precision but also decreases the sensitivity towards feature dimensions to some extent.
Keywords :
feature extraction; text analysis; TF·IDF method; TF·IDF·Ci method; feature extraction; feature weighting method; text classification; Classification algorithms; Computers; Economics; Feature extraction; Sensitivity; Support vector machine classification; Text categorization;
Conference_Titel :
Internet Technology and Applications, 2010 International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-5142-5
Electronic_ISBN :
978-1-4244-5143-2
DOI :
10.1109/ITAPP.2010.5566113