DocumentCode :
3694424
Title :
A new term weighting scheme based on class specific document frequency for document representation and classification
Author :
Suthira Plansangket;John Q Gan
Author_Institution :
School of Computer Science and Electronic Engineering, University of Essex, United Kingdom
fYear :
2015
Firstpage :
5
Lastpage :
8
Abstract :
Document classification is usually more challenging than numerical data classification, because it is much more difficult to effectively represent documents than numerical data for classification purposes. Vector space model (VSM) has been widely used for document representation for classification, in which a document is represented by a vector of feature values based on a bag of words. This paper proposes a new feature for document representation under the VSM framework, class specific document frequency (CSDF), which leads to a novel term weighting scheme based on term frequency (TF), term presence (TP), and the newly proposed feature. The experimental results show that the proposed features, CSDF and TF-CSDF, effectively improve the performance of document classification in comparison with other widely used VSM document representations.
Keywords :
"Accuracy","Training","Support vector machines","Computer science","Testing","Semantics","Feature extraction"
Publisher :
ieee
Conference_Titel :
Computer Science and Electronic Engineering Conference (CEEC), 2015 7th
Type :
conf
DOI :
10.1109/CEEC.2015.7332690
Filename :
7332690
Link To Document :
بازگشت