• DocumentCode
    2665884
  • Title

    A new weighting algorithm for linear classifier

  • Author

    Chen, Keli ; Zong, Chengqing

  • Author_Institution
    Nat. Lab. of Pattern Recognition, Chinese Acad. of Sci., Beijing, China
  • fYear
    2003
  • fDate
    26-29 Oct. 2003
  • Firstpage
    650
  • Lastpage
    655
  • Abstract
    In the domain of text categorization (TC), the TF (term frequency)* IDF (inverse document frequency) weighting algorithm and TF*IWF*IWF weighting algorithm are widely used. However, the two algorithms are too biased by the term frequency and neglect the imbalance between classes. In this paper, we propose a new weighting algorithm, which is named as TF (term frequency)*IWF (inverse word frequency)*IWF (inverse word frequency)*VE (variance and expectation). The new algorithm improves the TF*IWF*IWF weighting algorithm in both TF and VE. This paper compares the new algorithm with TF*IWF*IWF algorithm respectively in theory and experiment. From the preliminary experiment, we find that the F1-measure has been improved for 11.78%.
  • Keywords
    classification; computational linguistics; text analysis; inverse document frequency; inverse word frequency; term frequency; text categorization; text classifier; weighting algorithm; Automation; Equations; Frequency; Laboratories; Pattern recognition; Statistics; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
  • Conference_Location
    Beijing, China
  • Print_ISBN
    0-7803-7902-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2003.1275987
  • Filename
    1275987