Title of article
AGNES-SMOTE: An Oversampling Algorithm Based on Hierarchical Clustering and Improved SMOTE
Author/Authors
Wang, Xin School of Computer Information Security - Guilin University of Electronic Technology, Guilin, China , Yang, Yue School of Computer Information Security - Guilin University of Electronic Technology, Guilin, China , Chen, Mingsong Beihai Campus - Guilin University of Electronic Technology, Beihai, China , Wang, Qin Beihai Campus - Guilin University of Electronic Technology, Beihai, China , Qin, Qin Beihai Campus - Guilin University of Electronic Technology, Beihai, China , Jiang, Hua School of Computer Information Security - Guilin University of Electronic Technology, Guilin, China , Wang, Huijiao School of Computer Information Security - Guilin University of Electronic Technology, Guilin, China
Pages
9
From page
1
To page
9
Abstract
Aiming at low classification accuracy of imbalanced datasets, an oversampling algorithm—AGNES-SMOTE (Agglomerative Nesting-Synthetic Minority Oversampling Technique) based on hierarchical clustering and improved SMOTE—is proposed. Its key procedures include hierarchically cluster majority samples and minority samples, respectively; divide minority subclusters on the basis of the obtained majority subclusters; select “seed sample” based on the sampling weight and probability distribution of minority subcluster; and restrict the generation of new samples in a certain area by centroid method in the sampling process. The combination of AGNES-SMOTE and SVM (Support Vector Machine) is presented to deal with imbalanced datasets classification. Experiments on UCI datasets are conducted to compare the performance of different algorithms mentioned in the literature. Experimental results indicate AGNES-SMOTE excels in synthesizing new samples and improves SVM classification performance on imbalanced datasets.
Keywords
AGNES-SMOTE , SMOTE , Oversampling Algorithm , Hierarchical Clustering and Improved
Journal title
Scientific Programming
Serial Year
2020
Full Text URL
Record number
2610331
Link To Document