• DocumentCode
    1968649
  • Title

    Study of text classification methods for data sets with huge features

  • Author

    Wei, Guiying ; Gao, Xuedong ; Wu, Sen

  • Author_Institution
    Sch. of Econ. & Manage., Univ. of Sci. & Technol. Beijing, Beijing, China
  • Volume
    1
  • fYear
    2010
  • fDate
    10-11 July 2010
  • Firstpage
    433
  • Lastpage
    436
  • Abstract
    Text classification has gained booming interest over the past few years. In this paper we look at the main approaches that have been taken towards text classification. The key text classification techniques including text model, feature selection methods and text classification algorithms are discussed. This work focus on the implementation of a text classification system based on Mutual Information and K-Nearest Neighbor algorithm and Support Vector Machine. The experimental results on Reuters collection are also presented. It shows that Mutual Information is a kind of efficient dimension reduction method for text data sets with huge features.
  • Keywords
    feature extraction; pattern classification; support vector machines; text analysis; K-Nearest Neighbor algorithm; Reuters collection; dimension reduction method; feature selection methods; huge feature data sets; mutual information; support vector machine; text classification algorithms; text model; Indexing; Support vector machines; K-Nearest Neighbor; Mutual Information; Text classification; feature selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Industrial and Information Systems (IIS), 2010 2nd International Conference on
  • Conference_Location
    Dalian
  • Print_ISBN
    978-1-4244-7860-6
  • Type

    conf

  • DOI
    10.1109/INDUSIS.2010.5565817
  • Filename
    5565817