• DocumentCode
    1865890
  • Title

    A Novel Differential Evolution-Clustering Hybrid Resampling Algorithm on Imbalanced Datasets

  • Author

    Chen, Leichen ; Cai, Zhihua ; Chen, Lu ; Gu, Qiong

  • Author_Institution
    Sch. of Comput., China Univ. of Geosci., Wuhan, China
  • fYear
    2010
  • fDate
    9-10 Jan. 2010
  • Firstpage
    81
  • Lastpage
    85
  • Abstract
    When dealing with the imbalanced datasets (IDS), the hyperplane of Support vector machine (SVM) tends to minority class (positive class), which causes low classification accuracy. Aiming at this problem, we propose a novel differential evolution-clustering hybrid resampling SVM algorithm (DEC-SVM). This algorithm utilizes the similar mutation and crossover operators of Differential Evolution (DE) for over-sampling to enlarge the ratio of positive samples, and then we apply clustering to the over-sampled training dataset as a data cleaning method for both classes, removing the redundant or noisy samples. Experimental results show that our method DEC-SVM performs better, compared with standard SVM, SMOTE-SVM and DE-SVM under the criterion of F-measure and ROC Area (AUC) upon ten different UCI standard datasets.
  • Keywords
    pattern clustering; sampling methods; support vector machines; F-measure criterion; ROC area criterion; clustering algorithm; crossover operators; data cleaning method; differential evolution; hybrid resampling algorithm; imbalanced datasets; minority class; mutation operators; support vector machine; Cleaning; Clustering algorithms; Data mining; Electronic mail; Geology; Intrusion detection; Learning systems; Signal to noise ratio; Support vector machine classification; Support vector machines; clustering; differential evolution; hybrid resampling; imbalanced datasets; support vector machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge Discovery and Data Mining, 2010. WKDD '10. Third International Conference on
  • Conference_Location
    Phuket
  • Print_ISBN
    978-1-4244-5397-9
  • Electronic_ISBN
    978-1-4244-5398-6
  • Type

    conf

  • DOI
    10.1109/WKDD.2010.48
  • Filename
    5432725