• DocumentCode
    2898546
  • Title

    Two-Stage Feature Selection Method for Text Classification

  • Author

    Li Xi ; Dai Hang ; Wang Mingwen

  • Author_Institution
    Sch. of Math. & Comput. Sci., Jiangxi Sci. & Technol. Normal Univ., Nanchang, China
  • Volume
    1
  • fYear
    2009
  • fDate
    18-20 Nov. 2009
  • Firstpage
    234
  • Lastpage
    238
  • Abstract
    Dimension reduction is the process of reducing the number of random features under consideration, and can be divided into the feature selection and the feature extraction. A two-stage feature selection method based on the Regularized Least Squares-Multi Angle Regression and Shrinkage (RLS-MARS) model is proposed in this paper: In the first stage, a new weighting method, the Term Frequency Inverse Document and Category Frequency Collection normalization (TF-IDCFC) is applied to measure the features, and select the important features by using the category information as a factor. In the second stage, the RLS-MARS model is used to select the relevant information, while the Regularized Least Squares (RLS) with the Least Angle Regression and Shrinkage (LARS) can be viewed as an efficient approach. The experiments on Fudan University Chinese Text Classification Corpus and 20 Newsgroups, both of those datasets demonstrate the effectiveness of the new feature selection method for text classification in several classical algorithms: KNN and SVMLight.
  • Keywords
    classification; least squares approximations; natural language processing; regression analysis; text analysis; RLS-MARS; TF-IDCFC; dimension reduction; feature selection; least angle regression and shrinkage; regularized least squares-multi angle regression and shrinkage; term frequency inverse document and category frequency collection normalization; text classification; Computer security; Data mining; Feature extraction; Frequency; Information security; Least squares methods; Mathematics; Resonance light scattering; Space technology; Text categorization; Feature Selection; LARS; RLS; RLS-MARS; TF-IDCFC; Text Classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia Information Networking and Security, 2009. MINES '09. International Conference on
  • Conference_Location
    Hubei
  • Print_ISBN
    978-0-7695-3843-3
  • Electronic_ISBN
    978-1-4244-5068-8
  • Type

    conf

  • DOI
    10.1109/MINES.2009.127
  • Filename
    5368360