DocumentCode
3694424
Title
A new term weighting scheme based on class specific document frequency for document representation and classification
Author
Suthira Plansangket;John Q Gan
Author_Institution
School of Computer Science and Electronic Engineering, University of Essex, United Kingdom
fYear
2015
Firstpage
5
Lastpage
8
Abstract
Document classification is usually more challenging than numerical data classification, because it is much more difficult to effectively represent documents than numerical data for classification purposes. Vector space model (VSM) has been widely used for document representation for classification, in which a document is represented by a vector of feature values based on a bag of words. This paper proposes a new feature for document representation under the VSM framework, class specific document frequency (CSDF), which leads to a novel term weighting scheme based on term frequency (TF), term presence (TP), and the newly proposed feature. The experimental results show that the proposed features, CSDF and TF-CSDF, effectively improve the performance of document classification in comparison with other widely used VSM document representations.
Keywords
"Accuracy","Training","Support vector machines","Computer science","Testing","Semantics","Feature extraction"
Publisher
ieee
Conference_Titel
Computer Science and Electronic Engineering Conference (CEEC), 2015 7th
Type
conf
DOI
10.1109/CEEC.2015.7332690
Filename
7332690
Link To Document