• DocumentCode
    537586
  • Title

    Improved Feature Selection Algorithm Based on Concentration and Dispersion

  • Author

    Shen You-Wen ; Zhao Xin-Jian

  • Author_Institution
    Coll. of Comput. Sci. & Technol., Zhejiang Univ. of Technol., Hangzhou, China
  • Volume
    1
  • fYear
    2010
  • fDate
    23-24 Oct. 2010
  • Firstpage
    262
  • Lastpage
    265
  • Abstract
    This paper analyzes the concentration and dispersion of the integrated feature selection algorithm (TFFS),and finds their shortcomings: it is difficult for concentration to measure the weigh of the low frequent terms; dispersion ignores the impact of term whose mutual information is negative. Propose a modified feature selection algorithm (TFFSL), which makes certain improvements on concentration and dispersion, and takes the length of terms as a measure of weight factors. The SVM classification experimental results show that: compared with TFFS algorithm, TFFSL algorithm has higher accuracy and more capacity of eliminating irrelevant terms.
  • Keywords
    feature extraction; information management; pattern classification; support vector machines; text analysis; SVM classification; TFFSL algorithm; feature selection algorithm; mutual information; feature selection; feature weight; mutual information; support vector machine; text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Information Systems and Mining (WISM), 2010 International Conference on
  • Conference_Location
    Sanya
  • Print_ISBN
    978-1-4244-8438-6
  • Type

    conf

  • DOI
    10.1109/WISM.2010.28
  • Filename
    5662323