Title :
An Improved Approach to Terms Weighting in Text Classification
Author :
Ma Zhanguo ; Feng Jing ; Chen Liang ; Hu Xiangyi ; Shi Yanqin
Author_Institution :
Beijing Sci. & Technol. Inf. Inst., Beijing, China
Abstract :
Most of traditional text classification methods utilize term frequency (tf) and inverse document frequency (idf) for representing importance of terms and computing weighting of ones in classifying a text document. Term weighting plays an important role to achieve high performance in text classification. Although the tf-idf model is a popular method, it is not involved class information of the terms. This paper provides an improved tf-idf-ci model to compute weighting of the terms. The intra class information and inner class information are joined. The experimental results show that the performance is enhanced. The role of important and representative terms is raised and the effect of the unimportant feature term to classification is decreased. In addition, the F1 based on tf-idf-ci algorithm is higher than based on traditional tf-idf model.
Keywords :
pattern classification; text analysis; inner class information; intra class information; inverse document frequency; term frequency; terms weighting; text document classification; tf-idf model; Analytical models; Classification algorithms; Computational modeling; Machine learning; Support vector machine classification; Text categorization; Training;
Conference_Titel :
Computer and Management (CAMAN), 2011 International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-9282-4
DOI :
10.1109/CAMAN.2011.5778755