Title :
A term weighting scheme based on the measure of relevance and distinction for text categorization
Author :
Jieming Yang ; Jing Wang ; Zhiying Liu ; Zhaoyang Qu
Author_Institution :
Sch. of Inf. Eng., Northeast Dianli Univ., Jilin, China
Abstract :
Feature selection is often considered as a key step in text categorization. In this paper, we proposed a new feature selection algorithm, named AD, which comprehensively measures the degree of relevance and distinction of terms occur in document set. We evaluated AD on three benchmark document collections, 20-Newsgroups, Reuters-21578 and WebKB, using two classification algorithms, Naive Bayes and Support Vector Machines. The experimental results, comparing AD with six classic feature-selection algorithms, show that the proposed method AD is significantly superior to Information Gain, Mutual Information, Odds Ratio, DIA association factor, Orthogonal Centroid Feature Selection and Ambiguity Measure when Naive Bayes classifier is used and significantly outperforms IG,MI,OR,DIA,OCFS and AM when Support Vector Machines is used.
Keywords :
Bayes methods; support vector machines; text analysis; DIA association factor; Naive Bayes; benchmark document collections; document set; feature selection algorithm; information gain; mutual information; odds ratio; orthogonal centroid feature selection; support vector machines; term weighting scheme; text categorization; Algorithm design and analysis; Classification algorithms; Feature extraction; Mutual information; Support vector machines; Text categorization; Training; Feature selection; Text categorization; term weighting;
Conference_Titel :
Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2015 16th IEEE/ACIS International Conference on
Conference_Location :
Takamatsu
DOI :
10.1109/SNPD.2015.7176178