• DocumentCode
    3301584
  • Title

    An Improved Approach to Terms Weighting in Text Classification

  • Author

    Ma Zhanguo ; Feng Jing ; Chen Liang ; Hu Xiangyi ; Shi Yanqin

  • Author_Institution
    Beijing Sci. & Technol. Inf. Inst., Beijing, China
  • fYear
    2011
  • fDate
    19-21 May 2011
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Most of traditional text classification methods utilize term frequency (tf) and inverse document frequency (idf) for representing importance of terms and computing weighting of ones in classifying a text document. Term weighting plays an important role to achieve high performance in text classification. Although the tf-idf model is a popular method, it is not involved class information of the terms. This paper provides an improved tf-idf-ci model to compute weighting of the terms. The intra class information and inner class information are joined. The experimental results show that the performance is enhanced. The role of important and representative terms is raised and the effect of the unimportant feature term to classification is decreased. In addition, the F1 based on tf-idf-ci algorithm is higher than based on traditional tf-idf model.
  • Keywords
    pattern classification; text analysis; inner class information; intra class information; inverse document frequency; term frequency; terms weighting; text document classification; tf-idf model; Analytical models; Classification algorithms; Computational modeling; Machine learning; Support vector machine classification; Text categorization; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Management (CAMAN), 2011 International Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-9282-4
  • Type

    conf

  • DOI
    10.1109/CAMAN.2011.5778755
  • Filename
    5778755