Title :
Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data
Author :
Khoshgoftaar, Taghi M. ; Van Hulse, Jason ; Napolitano, Amri
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL, USA
fDate :
5/1/2011 12:00:00 AM
Abstract :
This paper compares the performance of several boosting and bagging techniques in the context of learning from imbalanced and noisy binary-class data. Noise and class imbalance are two well-established data characteristics encountered in a wide range of data mining and machine learning initiatives. The learning algorithms studied in this paper, which include SMOTEBoost, RUSBoost, Exactly Balanced Bagging, and Roughly Balanced Bagging, combine boosting or bagging with data sampling to make them more effective when data are imbalanced. These techniques are evaluated in a comprehensive suite of experiments, for which nearly four million classification models were trained. All classifiers are assessed using seven different performance metrics, providing a complete perspective on the performance of these techniques, and results are tested for statistical significance via analysis-of-variance modeling. The experiments show that the bagging techniques generally outperform boosting, and hence in noisy data environments, bagging is the preferred method for handling class imbalance.
Keywords :
data mining; learning (artificial intelligence); pattern classification; RUSBoost; SMOTEBoost; bagging technique; binary class data; boosting technique; classification technique; data mining; data sampling; exactly balanced bagging; imbalanced data; learning algorithm; machine learning; noisy data; roughly balanced bagging; Bagging; Boosting; Neodymium; Noise; Noise measurement; Training data; Bagging; binary classification; boosting; class imbalance; class noise;
Journal_Title :
Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on
DOI :
10.1109/TSMCA.2010.2084081