• DocumentCode
    2307504
  • Title

    A Redundancy Based Term Weighting Approach for Text Categorization

  • Author

    Lu, Zhen-Yu ; Lin, Yong-Min ; Zhao, Shuang ; Chen, Jing-Nian ; Zhu, Wei-Dong

  • Author_Institution
    Coll. of Econ. & Manage., Hebei Polytech. Univ., Tangshan, China
  • Volume
    2
  • fYear
    2009
  • fDate
    19-21 May 2009
  • Firstpage
    36
  • Lastpage
    40
  • Abstract
    With the rapid development of World Wide Web, text categorization has played an important role in organizing and processing large amount of text data. TFmiddotIDF is a simple and quick term weighting method, and widely used in text categorization. But the drawback of TFmiddotIDF is large weight may be assigned to rarely appeared terms in despite of the posterior distribution. This paper presents a redundancy based term weighting method to solve this problem by taking posterior probability distribution into consideration. Experiments on Reuters-21578 and Chinese corpus provide by Computer and Information Technology Data Center of Fudan University show that this weighting method has better performance over TFmiddotIDF.
  • Keywords
    Internet; information retrieval; statistical distributions; text analysis; TF-IDF; World Wide Web; inverse document frequency; posterior probability distribution; redundancy-based term weighting; term frequency; text categorization; Educational institutions; Engineering management; Frequency measurement; Information technology; Organizing; Probability distribution; Software development management; Software engineering; Text categorization; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering, 2009. WCSE '09. WRI World Congress on
  • Conference_Location
    Xiamen
  • Print_ISBN
    978-0-7695-3570-8
  • Type

    conf

  • DOI
    10.1109/WCSE.2009.191
  • Filename
    5319713