Title :
Text Feature Ranking Based on Rough-set Theory
Author :
Tan, Songbo ; Wang, Yuefen ; Cheng, Xueqi
Author_Institution :
Chinese Acad. of Sci., Beijing
Abstract :
With the aim to reduce the dimensionality without sacrificing classification performance, the author gains insights from attribute reduction based on discernibility matrix in rough-set theory and proposes two text feature selection algorithms, i.e., DB1 and DB2. The experimental results indicate that DB2 not only yields much higher accuracy than information gain when the number of features is smaller than 6000, but also incurs much smaller CPU time than information gain.
Keywords :
rough set theory; text analysis; attribute reduction; discernibility matrix; information gain; rough-set theory; text feature ranking; text feature selection algorithm; Classification algorithms; Computers; Feature extraction; Frequency; Geology; Iron; Performance gain; Symmetric matrices; Text categorization; Vocabulary;
Conference_Titel :
Web Intelligence, IEEE/WIC/ACM International Conference on
Conference_Location :
Fremont, CA
Print_ISBN :
978-0-7695-3026-0