Title :
A feature selection based on deviation from feature centroid for text categorization
Author :
Yang, Jieming ; Liu, Zhiying
Author_Institution :
Coll. of Inf. Eng., Northeast Dianli Univ., Jilin, China
Abstract :
Text categorization is very vital in assisting people to process automatically the information which increases exponentially. But the high dimensionality of the vector space is a big hurdle in applying many sophisticated learning algorithms in text categorization. So feature selection has become a research focus in text categorization. In this paper, we proposed a new feature selection, named FCFS, which uses deviation from the feature centroid over all categories as the score of a feature. We compare the proposed method with four well known feature selections using two classification algorithms on three datasets. The experiments show that proposed method is significantly better than information gain, orthogonal centroid feature selection, mutual information and odds rate in terms of accuracy when Naïve Bayes classifier and Support Vector machines are used.
Keywords :
Bayes methods; feature extraction; learning (artificial intelligence); pattern classification; support vector machines; text analysis; FCFS; Naive Bayes classifier; feature selection; information gain; learning algorithm; orthogonal centroid feature selection; support vector machine; text categorization; vector space; Accuracy; Machine learning; Mutual information; Support vector machine classification; Text categorization; Training; feature selection; feature vector space; text categorization;
Conference_Titel :
Intelligent Control and Information Processing (ICICIP), 2011 2nd International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4577-0813-8
DOI :
10.1109/ICICIP.2011.6008227