Title :
EasyEnsemble and Feature Selection for Imbalance Data Sets
Author_Institution :
Sch. of Electr., Shanghai Dianji Univ., Shanghai, China
Abstract :
There are many labeled data sets which have an unbalanced representation among the classes in them. When the imbalance is large, classification accuracy on the smaller class tends to be lower. In particular, when a class is of great interest but occurs relatively rarely such as cases of fraud, instances of disease, and so on, it is important to accurately identify it. Here we propose a novel algorithm named MIEE (mutual information based feature selection for EasyEnsemble) to treat this problem and improve generalization performance of the EasyEnsemble classifier. Experimental results on the UCI data sets show that MIEE obtain better performance, compared with the asymmetric bagging and EasyEnsemble.
Keywords :
data handling; EasyEnsemble; MIEE; feature selection; imbalance data sets; mutual information; Bagging; Bioinformatics; Biology computing; Diseases; Embryo; Intelligent systems; Machine learning; Mutual information; Sampling methods; Systems biology; EasyEnsemble; feature selection; mutual information; unbalanced data sets;
Conference_Titel :
Bioinformatics, Systems Biology and Intelligent Computing, 2009. IJCBS '09. International Joint Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3739-9
DOI :
10.1109/IJCBS.2009.22