Title :
Core-generating Approximate Minimum Entropy Discretization for Rough Set Feature Selection: An Experimental Investigation
Author :
Tian, David ; Keane, John ; Zeng, Xiao-Jun
Author_Institution :
Manchester Univ., Manchester
Abstract :
Rough set feature selection (RSFS) can be used to improve classifier performance. RSFS removes redundant attributes whilst keeping important ones that preserve the classification power of the original dataset. The feature subsets selected by RSFS are termed reducts. The intersection of all reducts is termed the core. As RSFS works on discrete attributes only, for real-valued datasets discretization of the real attributes is performed before RSFS. The core size of the discretized datasets is determined by the discretization process. Previous work has shown that the core size of the discretized dataset critically affects the performance of RSFS. This paper proposes a type of discretization termed core-generating approximate minimum entropy discretization (C-GAME) which selects a set of minimum entropy cuts capable of generating discrete data with nonempty cores. The paper defines C-GAME and then models it as a constraint satisfaction optimization problem which is solved using the branch and bound algorithm. Experiments have been performed on 2 datasets from the UCI database to investigate the performance of C-GAME as a pre-processing step for RSFS. Results show that, for these datasets, C-GAME outperforms both the recursive minimal entropy partition discretization method (RMEP) and the original decision trees without feature selection.
Keywords :
constraint theory; minimisation; pattern classification; rough set theory; tree searching; C-GAME; branch-and-bound algorithm; classifier performance; constraint satisfaction optimization problem; core-generating approximate minimum entropy discretization; discretized dataset; rough set feature selection; Classification tree analysis; Constraint optimization; Decision trees; Discrete transforms; Entropy; Genetics; Partitioning algorithms; Rough sets; Set theory; Spatial databases;
Conference_Titel :
Fuzzy Systems Conference, 2007. FUZZ-IEEE 2007. IEEE International
Conference_Location :
London
Print_ISBN :
1-4244-1209-9
Electronic_ISBN :
1098-7584
DOI :
10.1109/FUZZY.2007.4295437