Title of article :
Class-indexing-based term weighting for automatic text classification
Author/Authors :
Fuji Ren، نويسنده , , Mohammad Golam Sohrab، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2013
Pages :
17
From page :
109
To page :
125
Abstract :
Most of the previous studies related on different term weighting emphasize on the document-indexing-based and four fundamental information elements-based approaches to address automatic text classification (ATC). In this study, we introduce class-indexing-based term-weighting approaches and judge their effects in high-dimensional and comparatively low-dimensional vector space over the TF.IDF and five other different term weighting approaches that are considered as the baseline approaches. First, we implement a class-indexing-based TF.IDF.ICF observational term weighting approach in which the inverse class frequency (ICF) is incorporated. In the experiment, we investigate the effects of TF.IDF.ICF over the Reuters-21578, 20 Newsgroups, and RCV1-v2 datasets as benchmark collections, which provide positive discrimination on rare terms in the vector space and biased against frequent terms in the text classification (TC) task. Therefore, we revised the ICF function and implemented a new inverse class space density frequency (ICSδF), and generated the TF.IDF.ICSδF method that provides a positive discrimination on infrequent and frequent terms. We present detailed evaluation of each category for the three datasets with term weighting approaches. The experimental results show that the proposed class-indexing-based TF.IDF.ICSδF term weighting approach is promising over the compared well-known baseline term weighting approaches.
Keywords :
classifier , Text classification , Indexing , Term weighting , Machine Learning , feature selection
Journal title :
Information Sciences
Serial Year :
2013
Journal title :
Information Sciences
Record number :
1215620
Link To Document :
بازگشت