Title :
An improved method of term weighting for text classification
Author :
Jiang, Hua ; Li, Ping ; Hu, Xin ; Wang, Shuyan
Author_Institution :
Sch. of Comput. Sci., Northeast Normal Univ., Changchun, China
Abstract :
In text classification, term weighting methods design appropriate weights to the given terms to improve the text classification performance. Traditional algorithm of term weighting only considers about tf (term frequency), idf (inverse document frequency) and so on, and this approach simply thinks low frequency terms are important, high frequency terms are unimportant, so it designs higher weights to the rare terms frequently. In this paper, we present an effective term weighting approach to avoid the deficiency of the traditional approach, and make use of kNN classifiers to classify over widely-used benchmark data set Reuters-21578. The experimental results prove that the new approach can improve the accuracy of classification.
Keywords :
pattern classification; text analysis; Reuters-21578 benchmark data set; high frequency terms; inverse document frequency; kNN classifiers; low frequency terms; term frequency; term weighting methods; text classification; Algorithm design and analysis; Computer science; Data mining; Delta modulation; Design methodology; Frequency; Information retrieval; Information theory; Performance gain; Text categorization; Text classification; kNN; term weighting; tf-idf;
Conference_Titel :
Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-4754-1
Electronic_ISBN :
978-1-4244-4738-1
DOI :
10.1109/ICICISYS.2009.5357842