• DocumentCode
    3001839
  • Title

    An Improved Algorithm to Term Weighting in Text Classification

  • Author

    Li, Ran ; Guo, Xianjiu

  • Author_Institution
    Inf. Eng. Coll., Dalian Ocean Univ., Dalian, China
  • fYear
    2010
  • fDate
    29-31 Oct. 2010
  • Firstpage
    1
  • Lastpage
    3
  • Abstract
    The traditional TF-IDF algorithm is a common method that is used to measure feature weight in text categorization. However, the algorithm doesn´t take the distribution of feature terms in inter-class and intra-class into consideration. Consequently, the algorithm can´t effectively weigh the distribution proportion of feature items. In order to solve this problem, information entropy in inter-class and intra-class which describes the distribution of feature terms was used to revise TF-IDF weight. Compared with traditional TF-IDF algorithm, the results of simulation experiment have demonstrated that the improved TF-DDF algorithm can get better classification results.
  • Keywords
    classification; entropy; text analysis; TF-IDF algorithm; distribution proportion; feature weight; information entropy; term weighting; text categorization; text classification; Accuracy; Biological system modeling; Classification algorithms; Entropy; Information entropy; Manganese; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia Technology (ICMT), 2010 International Conference on
  • Conference_Location
    Ningbo
  • Print_ISBN
    978-1-4244-7871-2
  • Type

    conf

  • DOI
    10.1109/ICMULT.2010.5630962
  • Filename
    5630962