• DocumentCode
    615260
  • Title

    Improved mutual information method for text feature selection

  • Author

    Ding Xiaoming ; Tang Yan

  • Author_Institution
    Coll. of Comput. & Inf. Sci., Southwest Univ., Chongqing, China
  • fYear
    2013
  • fDate
    26-28 April 2013
  • Firstpage
    163
  • Lastpage
    166
  • Abstract
    Reducing the dimensions of high-dimensional feature set is one of the difficulties of text categorization. Feature selection has been effectively applied in text classification, because of its low complexity of computing. Research works show that mutual information is a good feature selection method but doesn´t consider the term frequency in each category of the corpus and the connections between terms. To remedying the defects of traditional mutual information method, this article improved measure of mutual information by introducing the feature frequency in class and the dispersion of feature in class, and built a experimental platform by constructing a Chinese text classification system, and did a multi-set of experiments base on this system. The results show that the new feature selection approach has a more excellent effect in text categorization.
  • Keywords
    computational complexity; feature extraction; natural language processing; pattern classification; text analysis; Chinese text classification system; computing complexity; corpus category; feature frequency; high-dimensional feature set dimension reduction; improved mutual information method; text categorization; text classification; text feature selection approach; Art; Complexity theory; Computers; Text categorization; feature selection; mutual information; text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science & Education (ICCSE), 2013 8th International Conference on
  • Conference_Location
    Colombo
  • Print_ISBN
    978-1-4673-4464-7
  • Type

    conf

  • DOI
    10.1109/ICCSE.2013.6553903
  • Filename
    6553903