DocumentCode :
3478662
Title :
An improved TF-IDF weights function based on information theory
Author :
Wang, Na ; Wang, Pengyuan ; Zhang, Baowei
Author_Institution :
Dept. of Electron. & Commun., Zhengzhou Inst. of Aeronaut. Ind. Manage., Zhengzhou, China
Volume :
3
fYear :
2010
fDate :
12-13 June 2010
Firstpage :
439
Lastpage :
441
Abstract :
Vector Space Model (VSM) is a typical method to describe the text feature in text classification at present. It adopts TF-IDF weights to compute the term weighting in each dimension of the text feature. However, it only considers the relationship between the term and the whole text but neglects the relationship between different terms. Aiming at this problem an improved TF-IDF weights function is proposed which uses the distribution information among classes and inside a class. The experience shows that the improved method is feasible and effective. In addition, it greatly improves the accuracy of text category.
Keywords :
information theory; pattern classification; text analysis; TF-IDF weights function; information theory; inverse document frequency; term weighting; text classification; text frequency; vector space model; Biology; Function; Information Theory; TF-IDF Weights; Text Categorization; Vector Space Model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Communication Technologies in Agriculture Engineering (CCTAE), 2010 International Conference On
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-6944-4
Type :
conf
DOI :
10.1109/CCTAE.2010.5544382
Filename :
5544382
Link To Document :
بازگشت