DocumentCode :
3629965
Title :
An evaluation of existing and new feature selection metrics in text categorization
Author :
Serafettin Tasci;Tunga Gungor
Author_Institution :
Computer Engineering Department, Bogazici University, Bebek, 34342 Istanbul, Turkey
fYear :
2008
Firstpage :
1
Lastpage :
6
Abstract :
Text categorization is widely used for organizing and manipulating the documents in the electronic medium. Since the data in text categorization field are high-dimensional, feature selection is crucial to make the task more efficient and precise. In this paper, we make an extensive evaluation of the feature selection metrics used in text categorization by using local and global policies. For the experiments, we use three datasets which vary in size, complexity and skewness. We use SVM as the classifier and tf-idf weighting for term weighting. We observed that almost in all metrics, local policy outperforms when the number of keywords is low and global policy outperforms as the number of keywords increases. In addition to the evaluation of the existing feature selection metrics, we propose new metrics, which have shown high success rates especially in datasets with a low number of keywords. Moreover, we propose a keyword selection policy called Adaptive Keyword Selection (AKS). It is based on selecting different number of keywords for different classes and it improved the performance significantly in skew datasets.
Keywords :
"Text categorization","Support vector machines","Support vector machine classification","Organizing","Machine learning","Neural networks","Machine learning algorithms","Electronic publishing","Software libraries","Information retrieval"
Publisher :
ieee
Conference_Titel :
Computer and Information Sciences, 2008. ISCIS ´08. 23rd International Symposium on
Print_ISBN :
978-1-4244-2880-9
Type :
conf
DOI :
10.1109/ISCIS.2008.4717900
Filename :
4717900
Link To Document :
بازگشت