DocumentCode
1865890
Title
A Novel Differential Evolution-Clustering Hybrid Resampling Algorithm on Imbalanced Datasets
Author
Chen, Leichen ; Cai, Zhihua ; Chen, Lu ; Gu, Qiong
Author_Institution
Sch. of Comput., China Univ. of Geosci., Wuhan, China
fYear
2010
fDate
9-10 Jan. 2010
Firstpage
81
Lastpage
85
Abstract
When dealing with the imbalanced datasets (IDS), the hyperplane of Support vector machine (SVM) tends to minority class (positive class), which causes low classification accuracy. Aiming at this problem, we propose a novel differential evolution-clustering hybrid resampling SVM algorithm (DEC-SVM). This algorithm utilizes the similar mutation and crossover operators of Differential Evolution (DE) for over-sampling to enlarge the ratio of positive samples, and then we apply clustering to the over-sampled training dataset as a data cleaning method for both classes, removing the redundant or noisy samples. Experimental results show that our method DEC-SVM performs better, compared with standard SVM, SMOTE-SVM and DE-SVM under the criterion of F-measure and ROC Area (AUC) upon ten different UCI standard datasets.
Keywords
pattern clustering; sampling methods; support vector machines; F-measure criterion; ROC area criterion; clustering algorithm; crossover operators; data cleaning method; differential evolution; hybrid resampling algorithm; imbalanced datasets; minority class; mutation operators; support vector machine; Cleaning; Clustering algorithms; Data mining; Electronic mail; Geology; Intrusion detection; Learning systems; Signal to noise ratio; Support vector machine classification; Support vector machines; clustering; differential evolution; hybrid resampling; imbalanced datasets; support vector machine;
fLanguage
English
Publisher
ieee
Conference_Titel
Knowledge Discovery and Data Mining, 2010. WKDD '10. Third International Conference on
Conference_Location
Phuket
Print_ISBN
978-1-4244-5397-9
Electronic_ISBN
978-1-4244-5398-6
Type
conf
DOI
10.1109/WKDD.2010.48
Filename
5432725
Link To Document