Title :
Improved feature selection approach TFIDF in text mining
Author :
Jing, Li-Ping ; Huang, Hou-Kuan ; Shi, Hong-Bo
Author_Institution :
Sch. of Comput. & Inf. Technol., Northern JiaoTong Univ., Beijing, China
Abstract :
This paper describes the feature selection method TFIDF (term frequency, inverse document frequency). With it, we process the data resource and set up the vector space model in order to provide a convenient data structure for text categorization. We calculate the precision of this method with the help of categorization results. According to the empirical results, we analyze its advantages and disadvantages and present a new TFIDF-based feature selection approach to improve its accuracy.
Keywords :
classification; data mining; data structures; feature extraction; indexing; TFIDF method; classification; data structure; evaluation function; feature selection; inverse document frequency; term frequency; text categorization; text mining; vector space model; Classification algorithms; Data mining; Data preprocessing; Data structures; Frequency; Indexing; Learning systems; Mutual information; Text categorization; Text mining;
Conference_Titel :
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
Print_ISBN :
0-7803-7508-4
DOI :
10.1109/ICMLC.2002.1174522