DocumentCode
2892363
Title
A Dynamic Sampling Framework for Multi-class Imbalanced Data
Author
Debowski, B. ; Areibi, Shawki ; Grewal, Gary ; Tempelman, J.
Author_Institution
Sch. of Eng., Univ. of Guelph, Guelph, ON, Canada
Volume
2
fYear
2012
fDate
12-15 Dec. 2012
Firstpage
113
Lastpage
118
Abstract
In this paper we present a Dynamic Sampling Framework for use with multi-class imbalanced data containing any number of classes. The framework makes use of existing sampling techniques such as RUS, ROS, and SMOTE and ties the classification algorithm into the sampling process in a wrapper like manner. In doing so the framework is able to search for a desirably sampled training set, thus eliminating the need to specify a target distribution and automatically tuning the training set distribution to the classification algorithm´s learning preferences. This is important when re-sampling multi-class data where manually searching for an appropriate target distribution would be a daunting task. We test both our Dynamic Sampling approach and traditional Static Sampling using RUS, ROS, SMOTE, ROS+RUS, and SMOTE+RUS with several classification algorithms on a four class, highly imbalanced data set. We compare the results of Static Sampling and Dynamic Sampling and find that overall both techniques are able to raise Recall for the highest minority classes, but Dynamic Sampling is also able to maintain or raise Recall for the majority classes. Also, Dynamic Sampling is overall more robust and resilient, and is better able to sustain classifier Accuracy and to raise G-Mean and Minimum F-Measures.
Keywords
data mining; pattern classification; sampling methods; statistical distributions; G-Mean; ROS; RUS; SMOTE; classification algorithm; classification algorithm learning preferences; dynamic sampling; dynamic sampling framework; minimum F-measures; multiclass data re-sampling; multiclass imbalanced data; sampled training set; sampling process; sampling techniques; static sampling; target distribution; training set distribution; Accuracy; Algorithm design and analysis; Artificial neural networks; Educational institutions; Heuristic algorithms; Niobium; Training; Dynamic Sampling; Imbalanced Data; Multi-class;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications (ICMLA), 2012 11th International Conference on
Conference_Location
Boca Raton, FL
Print_ISBN
978-1-4673-4651-1
Type
conf
DOI
10.1109/ICMLA.2012.144
Filename
6406737
Link To Document