Title : 
A Feature Selection Simultaneously Based on Intra-category and Extra-Category for Text Categorization
         
        
            Author : 
Liu, Zhiying ; Yang, Jieming
         
        
            Author_Institution : 
Coll. of Inf. Eng., Northeast Dianli Univ., Jilin, China
         
        
        
        
        
        
            Abstract : 
Text categorization is an important means to process automatically the information which increases exponentially. But due to the high dimensionality of the text corpus, many sophisticated classifiers can not be efficiently and effectively used in text categorization. So feature selection has become a research focus in text categorization. In this paper, we proposed a new feature selection, named SIE, which simultaneously considers the number of documents that contain a feature in intra-category and extra-category. We compare the proposed method with four well known feature selections using two classification algorithms on two text corpora. The experiments show that the proposed method performs significantly better than information gain, orthogonal centroid feature selection and Poisson distribution, and produces comparable performance with X2-statistic in terms of accuracy when Naïve Bayes classifier and Support Vector machines are used.
         
        
            Keywords : 
Bayes methods; support vector machines; text analysis; Naive Bayes classifier; Poisson distribution; SIE; SVM; X2-statistic; classification algorithms; extracategory; information gain; intracategory; orthogonal centroid feature selection; support vector machines; text categorization; text corpora; Accuracy; Classification algorithms; Educational institutions; Machine learning; Support vector machines; Text categorization; Training; dimensionality reduction; feature selection; text categorization;
         
        
        
        
            Conference_Titel : 
Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2011 International Conference on
         
        
            Conference_Location : 
Zhejiang
         
        
            Print_ISBN : 
978-1-4577-0676-9
         
        
        
            DOI : 
10.1109/IHMSC.2011.114