DocumentCode :
3316856
Title :
Core-generating Approximate Minimum Entropy Discretization for Rough Set Feature Selection: An Experimental Investigation
Author :
Tian, David ; Keane, John ; Zeng, Xiao-Jun
Author_Institution :
Manchester Univ., Manchester
fYear :
2007
fDate :
23-26 July 2007
Firstpage :
1
Lastpage :
6
Abstract :
Rough set feature selection (RSFS) can be used to improve classifier performance. RSFS removes redundant attributes whilst keeping important ones that preserve the classification power of the original dataset. The feature subsets selected by RSFS are termed reducts. The intersection of all reducts is termed the core. As RSFS works on discrete attributes only, for real-valued datasets discretization of the real attributes is performed before RSFS. The core size of the discretized datasets is determined by the discretization process. Previous work has shown that the core size of the discretized dataset critically affects the performance of RSFS. This paper proposes a type of discretization termed core-generating approximate minimum entropy discretization (C-GAME) which selects a set of minimum entropy cuts capable of generating discrete data with nonempty cores. The paper defines C-GAME and then models it as a constraint satisfaction optimization problem which is solved using the branch and bound algorithm. Experiments have been performed on 2 datasets from the UCI database to investigate the performance of C-GAME as a pre-processing step for RSFS. Results show that, for these datasets, C-GAME outperforms both the recursive minimal entropy partition discretization method (RMEP) and the original decision trees without feature selection.
Keywords :
constraint theory; minimisation; pattern classification; rough set theory; tree searching; C-GAME; branch-and-bound algorithm; classifier performance; constraint satisfaction optimization problem; core-generating approximate minimum entropy discretization; discretized dataset; rough set feature selection; Classification tree analysis; Constraint optimization; Decision trees; Discrete transforms; Entropy; Genetics; Partitioning algorithms; Rough sets; Set theory; Spatial databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems Conference, 2007. FUZZ-IEEE 2007. IEEE International
Conference_Location :
London
ISSN :
1098-7584
Print_ISBN :
1-4244-1209-9
Electronic_ISBN :
1098-7584
Type :
conf
DOI :
10.1109/FUZZY.2007.4295437
Filename :
4295437
Link To Document :
بازگشت