Title :
Ensemble of Cost-Sensitive Hypernetworks for Class-Imbalance Learning
Author :
Jin Wang ; Ping-li Huang ; Kai-wei Sun ; Bao-lin Cao ; Rui Zhao
Author_Institution :
Chongqing Key Lab. of Comput. Intell., Chongqing Univ. of Posts & Telecommun., Chongqing, China
Abstract :
Hyper network is a probabilistic graphic model of learning and memory inspired by biomolecular networks, which is very useful for discovering higher-order correlations among multiple attributes. However, as many traditional machine learning algorithms, hyper networks may bias towards the majority class, thus producing poor predictive accuracy over the minority class when learining with imbalacned datasets. In this paper, three hyper network-based models, namely ensemble of cost-sensitive hyper networks (EN-CS-HN), ensemble of cost-sensitive hyper networks with under-sampling (EN-CS-HN-UNDE), and ensemble of cost-sensitive hyper networks with synthetic minority over-sampling technique (EN-CS-HN-SMOTE) are proposed respectively. To examine the performance of the proposed schemes, we conduct experiments on ten imbalanced datasets collected from UCI machine learning repository, wherein the proposed methods are compared with various state-of-the-art approaches using three metrics: G-Mean, F-Measure and area under the receiver operating characteristic curve (AUC-ROC). Experimental results show that the proposed methods are able to surpass or match the previously known best algorithms on most of the ten datasets.
Keywords :
graph theory; learning (artificial intelligence); sampling methods; AUC-ROC; EN-CS-HN-SMOTE; EN-CS-HN-UNDE; F-measure; G-mean; UCI machine learning repository; area under the receiver operating characteristic curve; biomolecular networks; class-imbalance learning; cost-sensitive hyper networks with synthetic minority over-sampling technique; cost-sensitive hyper networks with under-sampling; cost-sensitive hypernetworks; higher-order correlations; hyper network-based models; imbalace datasets; machine learning algorithms; multiple attributes; predictive accuracy; probabilistic graphic model; Genetic algorithms; Libraries; Machine learning algorithms; Measurement; Probabilistic logic; Training; Training data; SMOTE; cost-sensitive learning; ensemble learning; hypernetworks; imbalanced classification; under-sampling;
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on
Conference_Location :
Manchester
DOI :
10.1109/SMC.2013.324