• DocumentCode
    3507476
  • Title

    A Variance–Mean Based Feature Selection in Text Classification

  • Author

    Yin, Shen ; Jiang, Zongli

  • Author_Institution
    Beijing Univ. of Technol., Beijing
  • Volume
    3
  • fYear
    2009
  • fDate
    7-8 March 2009
  • Firstpage
    519
  • Lastpage
    522
  • Abstract
    Feature selection is an important process to choose a subset of features relevant to a particular application in text classification. Based on the mutual information method, we designed variance-mean based feature selection (VM). After computing and ranking the variance of class discrimination value vector for each word, we can choose the most distinguishable features. This method has advantages in the case of choosing smaller number of features, especially for classes with small number of training documents. It keeps the best features, and thus improves the final performance of the classification system. The experiment results indicate the effectiveness of the proposed feature selection method in a text classification.
  • Keywords
    pattern classification; text analysis; class discrimination value vector; mutual information method; text classification; training documents; variance-mean based feature selection; Application software; Bayesian methods; Computer science; Computer science education; Educational technology; Frequency; Mutual information; Niobium; Probability; Text categorization; feature selection; text classification; variance-mean;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Education Technology and Computer Science, 2009. ETCS '09. First International Workshop on
  • Conference_Location
    Wuhan, Hubei
  • Print_ISBN
    978-1-4244-3581-4
  • Type

    conf

  • DOI
    10.1109/ETCS.2009.646
  • Filename
    4959366