• DocumentCode
    1598439
  • Title

    A Feature Selection Simultaneously Based on Intra-category and Extra-Category for Text Categorization

  • Author

    Liu, Zhiying ; Yang, Jieming

  • Author_Institution
    Coll. of Inf. Eng., Northeast Dianli Univ., Jilin, China
  • Volume
    2
  • fYear
    2011
  • Firstpage
    178
  • Lastpage
    181
  • Abstract
    Text categorization is an important means to process automatically the information which increases exponentially. But due to the high dimensionality of the text corpus, many sophisticated classifiers can not be efficiently and effectively used in text categorization. So feature selection has become a research focus in text categorization. In this paper, we proposed a new feature selection, named SIE, which simultaneously considers the number of documents that contain a feature in intra-category and extra-category. We compare the proposed method with four well known feature selections using two classification algorithms on two text corpora. The experiments show that the proposed method performs significantly better than information gain, orthogonal centroid feature selection and Poisson distribution, and produces comparable performance with X2-statistic in terms of accuracy when Naïve Bayes classifier and Support Vector machines are used.
  • Keywords
    Bayes methods; support vector machines; text analysis; Naive Bayes classifier; Poisson distribution; SIE; SVM; X2-statistic; classification algorithms; extracategory; information gain; intracategory; orthogonal centroid feature selection; support vector machines; text categorization; text corpora; Accuracy; Classification algorithms; Educational institutions; Machine learning; Support vector machines; Text categorization; Training; dimensionality reduction; feature selection; text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2011 International Conference on
  • Conference_Location
    Zhejiang
  • Print_ISBN
    978-1-4577-0676-9
  • Type

    conf

  • DOI
    10.1109/IHMSC.2011.114
  • Filename
    6038244