DocumentCode :
3107754
Title :
Improved feature selection approach TFIDF in text mining
Author :
Jing, Li-Ping ; Huang, Hou-Kuan ; Shi, Hong-Bo
Author_Institution :
Sch. of Comput. & Inf. Technol., Northern JiaoTong Univ., Beijing, China
Volume :
2
fYear :
2002
fDate :
2002
Firstpage :
944
Abstract :
This paper describes the feature selection method TFIDF (term frequency, inverse document frequency). With it, we process the data resource and set up the vector space model in order to provide a convenient data structure for text categorization. We calculate the precision of this method with the help of categorization results. According to the empirical results, we analyze its advantages and disadvantages and present a new TFIDF-based feature selection approach to improve its accuracy.
Keywords :
classification; data mining; data structures; feature extraction; indexing; TFIDF method; classification; data structure; evaluation function; feature selection; inverse document frequency; term frequency; text categorization; text mining; vector space model; Classification algorithms; Data mining; Data preprocessing; Data structures; Frequency; Indexing; Learning systems; Mutual information; Text categorization; Text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
Print_ISBN :
0-7803-7508-4
Type :
conf
DOI :
10.1109/ICMLC.2002.1174522
Filename :
1174522
Link To Document :
بازگشت