DocumentCode :
2641045
Title :
A Approach for Text Classification Feature Dimensionality Reduction and Rule Generation on Rough Set
Author :
Yin, Shiqun ; Huang, ZhiXing ; Chen, Lu ; Qiu, Yuhui
Author_Institution :
Fac. of Comput. & Inf. Sci., Southwest Univ., Chongqing
fYear :
2008
fDate :
18-20 June 2008
Firstpage :
554
Lastpage :
554
Abstract :
The high dimensional data are frequently met when we apply Web text classification. Mining in high dimensional data is extraordinarily difficult because of the curse of dimensionality. We must adopt feature dimensionality reduction to solve these problems. A attribute reduction algorithm based on rough set theory is given in this paper to reduce the text feature term and extract rule. First, the weight of feature term is made discrete. Then, the decision table is made with weight as the condition attributes and classes of texts as the decision attributes. Finally, the classification rules are extracted by attribute reduction. The method is simple and feasible. It is advantageous in improving the efficiency of the selected feature subset and suitable for high-volume text classification. The extracted rules are easy understand. The accuracy is higher and the speed of classification is faster than the classification based on vector space comparison. This paper describes the proposed technique and provides experimental results.
Keywords :
data mining; rough set theory; text analysis; Web text classification; attribute reduction algorithm; classification rules; decision attributes; feature dimensionality reduction; high dimensional data mining; rough set theory; rule extraction; rule generation; vector space comparison; Data mining; Feature extraction; Information filtering; Information retrieval; Information science; Internet; Search engines; Set theory; Space technology; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Innovative Computing Information and Control, 2008. ICICIC '08. 3rd International Conference on
Conference_Location :
Dalian, Liaoning
Print_ISBN :
978-0-7695-3161-8
Electronic_ISBN :
978-0-7695-3161-8
Type :
conf
DOI :
10.1109/ICICIC.2008.7
Filename :
4603742
Link To Document :
بازگشت