DocumentCode :
2665838
Title :
A novel weighting formula and feature selection for text classification based on rough set theory
Author :
Hu, Qinghua ; Yu, Daren ; Duan, Yanfeng ; Bao, Wen
Author_Institution :
Harbin Inst. of Technol., China
fYear :
2003
fDate :
26-29 Oct. 2003
Firstpage :
638
Lastpage :
645
Abstract :
Weighting formula and feature selection are key preprocessing in text classifying and mining. We analyze the drawbacks of weighting formula based on inverse document frequency and present a novel feature weighting and selecting method based on variable precision rough set model. Inverse document frequency (IDF) doesn´t take the classification information into account and the criterion based on IDF is not monotonous with the contribution that a feature makes to classification, which decreases the classifier´s performance. The measure of classification quality based on variable rough set model can deal with complex classification. It measures the contribution a feature makes to classification. It is introduced as a criterion for feature selecting and weighting in text classification. We name it as TFACQ. The experimental results show that the weighting formula and feature selection based on TFACQ have greatly improved the performance.
Keywords :
classification; data mining; feature extraction; rough set theory; text analysis; feature selection; inverse document frequency; text classification; text mining; variable precision rough set model; weighting formula; Automatic testing; Buildings; Data mining; Frequency; Information retrieval; Pattern recognition; Set theory; Statistical analysis; Statistics; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
Type :
conf
DOI :
10.1109/NLPKE.2003.1275985
Filename :
1275985
Link To Document :
بازگشت