DocumentCode :
2307504
Title :
A Redundancy Based Term Weighting Approach for Text Categorization
Author :
Lu, Zhen-Yu ; Lin, Yong-Min ; Zhao, Shuang ; Chen, Jing-Nian ; Zhu, Wei-Dong
Author_Institution :
Coll. of Econ. & Manage., Hebei Polytech. Univ., Tangshan, China
Volume :
2
fYear :
2009
fDate :
19-21 May 2009
Firstpage :
36
Lastpage :
40
Abstract :
With the rapid development of World Wide Web, text categorization has played an important role in organizing and processing large amount of text data. TFmiddotIDF is a simple and quick term weighting method, and widely used in text categorization. But the drawback of TFmiddotIDF is large weight may be assigned to rarely appeared terms in despite of the posterior distribution. This paper presents a redundancy based term weighting method to solve this problem by taking posterior probability distribution into consideration. Experiments on Reuters-21578 and Chinese corpus provide by Computer and Information Technology Data Center of Fudan University show that this weighting method has better performance over TFmiddotIDF.
Keywords :
Internet; information retrieval; statistical distributions; text analysis; TF-IDF; World Wide Web; inverse document frequency; posterior probability distribution; redundancy-based term weighting; term frequency; text categorization; Educational institutions; Engineering management; Frequency measurement; Information technology; Organizing; Probability distribution; Software development management; Software engineering; Text categorization; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Engineering, 2009. WCSE '09. WRI World Congress on
Conference_Location :
Xiamen
Print_ISBN :
978-0-7695-3570-8
Type :
conf
DOI :
10.1109/WCSE.2009.191
Filename :
5319713
Link To Document :
بازگشت