DocumentCode
2641045
Title
A Approach for Text Classification Feature Dimensionality Reduction and Rule Generation on Rough Set
Author
Yin, Shiqun ; Huang, ZhiXing ; Chen, Lu ; Qiu, Yuhui
Author_Institution
Fac. of Comput. & Inf. Sci., Southwest Univ., Chongqing
fYear
2008
fDate
18-20 June 2008
Firstpage
554
Lastpage
554
Abstract
The high dimensional data are frequently met when we apply Web text classification. Mining in high dimensional data is extraordinarily difficult because of the curse of dimensionality. We must adopt feature dimensionality reduction to solve these problems. A attribute reduction algorithm based on rough set theory is given in this paper to reduce the text feature term and extract rule. First, the weight of feature term is made discrete. Then, the decision table is made with weight as the condition attributes and classes of texts as the decision attributes. Finally, the classification rules are extracted by attribute reduction. The method is simple and feasible. It is advantageous in improving the efficiency of the selected feature subset and suitable for high-volume text classification. The extracted rules are easy understand. The accuracy is higher and the speed of classification is faster than the classification based on vector space comparison. This paper describes the proposed technique and provides experimental results.
Keywords
data mining; rough set theory; text analysis; Web text classification; attribute reduction algorithm; classification rules; decision attributes; feature dimensionality reduction; high dimensional data mining; rough set theory; rule extraction; rule generation; vector space comparison; Data mining; Feature extraction; Information filtering; Information retrieval; Information science; Internet; Search engines; Set theory; Space technology; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Innovative Computing Information and Control, 2008. ICICIC '08. 3rd International Conference on
Conference_Location
Dalian, Liaoning
Print_ISBN
978-0-7695-3161-8
Electronic_ISBN
978-0-7695-3161-8
Type
conf
DOI
10.1109/ICICIC.2008.7
Filename
4603742
Link To Document