DocumentCode :
2020271
Title :
Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features
Author :
DAI, Liuling ; HU, Jinwu ; Liu, WanChun
Author_Institution :
Sch. of Comput. Sci., Beijing Inst. of Technol., Beijing
Volume :
1
fYear :
2008
fDate :
17-18 Oct. 2008
Firstpage :
182
Lastpage :
185
Abstract :
Text categorization is a key problem of text mining. Although there are many research on this problem, the main works are focused on classification of big categories. There are very few researches on text categorization problems characterised by many redundant features. We call this kind of problem as fine-text-categorization. In this paper, we presented an algorithm based on modified CHI square feature selection and rough set to solve this problem. The features of categories are selected in a aggressive manner. The classification rules are extracted by using rough set theory. Experiments on real world corpora show that our algorithm can evidently improve classification precision, thus is promising.
Keywords :
feature extraction; rough set theory; text analysis; CHI square feature selection; redundant features; rough set; rough set theory; text categorization; text mining; Competitive intelligence; Computational intelligence; Computer science; Information retrieval; Information technology; Laboratories; Machine learning algorithms; Partial response channels; Support vector machines; Text categorization; SVM; feature selection; rough set; text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Design, 2008. ISCID '08. International Symposium on
Conference_Location :
Wuhan
Print_ISBN :
978-0-7695-3311-7
Type :
conf
DOI :
10.1109/ISCID.2008.178
Filename :
4725586
Link To Document :
بازگشت