Title :
A Approach for Text Classification Feature Dimensionality Reduction and Rule Generation on Rough Set
Author :
Yin, Shiqun ; Huang, ZhiXing ; Chen, Lu ; Qiu, Yuhui
Author_Institution :
Fac. of Comput. & Inf. Sci., Southwest Univ., Chongqing
Abstract :
The high dimensional data are frequently met when we apply Web text classification. Mining in high dimensional data is extraordinarily difficult because of the curse of dimensionality. We must adopt feature dimensionality reduction to solve these problems. A attribute reduction algorithm based on rough set theory is given in this paper to reduce the text feature term and extract rule. First, the weight of feature term is made discrete. Then, the decision table is made with weight as the condition attributes and classes of texts as the decision attributes. Finally, the classification rules are extracted by attribute reduction. The method is simple and feasible. It is advantageous in improving the efficiency of the selected feature subset and suitable for high-volume text classification. The extracted rules are easy understand. The accuracy is higher and the speed of classification is faster than the classification based on vector space comparison. This paper describes the proposed technique and provides experimental results.
Keywords :
data mining; rough set theory; text analysis; Web text classification; attribute reduction algorithm; classification rules; decision attributes; feature dimensionality reduction; high dimensional data mining; rough set theory; rule extraction; rule generation; vector space comparison; Data mining; Feature extraction; Information filtering; Information retrieval; Information science; Internet; Search engines; Set theory; Space technology; Text categorization;
Conference_Titel :
Innovative Computing Information and Control, 2008. ICICIC '08. 3rd International Conference on
Conference_Location :
Dalian, Liaoning
Print_ISBN :
978-0-7695-3161-8
Electronic_ISBN :
978-0-7695-3161-8
DOI :
10.1109/ICICIC.2008.7