DocumentCode :
2513577
Title :
A novel term weighting scheme with distributional coefficient for text categorization with support vector machine
Author :
Ping, Yuan ; Zhou, Ya-jian ; Yang, Yi-Xian ; Peng, Wei-ping
fYear :
2010
fDate :
28-30 Nov. 2010
Firstpage :
182
Lastpage :
185
Abstract :
In text categorization, vectorizing a document by probability distribution is an effective dimension reduction way to save training time. However, the data sets that share many common keywords between categories affect the classification performance seriously. To address that problem, firstly, we conduct an effective term weighting scheme consisting of posterior probability and relevance frequency to improve the performance of the traditional hybrid classification model. To get a better representation of the information contained in a document, as well as the introduction of an advanced hybrid classification model, we also propose a novel term weighting scheme with distributional coefficient so as to obtain further accuracy enhancement. The experimental results show that these proposed schemes are significantly better than the traditional method.
Keywords :
classification; statistical distributions; support vector machines; text analysis; classification performance; data sets; dimension reduction; distributional coefficient; information representation; probability distribution; relevance frequency; support vector machine; term weighting scheme; text categorization; Feature extraction; Hybrid power systems; Kernel; Machine learning; Support vector machines; Text categorization; Training; probability distribution; relevance frequency; support vector machines; term weighting scheme; text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Computing and Telecommunications (YC-ICT), 2010 IEEE Youth Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-8883-4
Type :
conf
DOI :
10.1109/YCICT.2010.5713075
Filename :
5713075
Link To Document :
بازگشت