• DocumentCode
    226912
  • Title

    An under-sampling method based on fuzzy logic for large imbalanced dataset

  • Author

    Wong, Ginny Y. ; Leung, Frank H. F. ; Sai-Ho Ling

  • Author_Institution
    Dept. of Electron. & Inf. Eng., Hong Kong Polytech. Univ., Hung Horn, China
  • fYear
    2014
  • fDate
    6-11 July 2014
  • Firstpage
    1248
  • Lastpage
    1252
  • Abstract
    Large imbalanced datasets have introduced difficulties to classification problems. They cause a high error rate of the minority class samples and a long training time of the classification model. Therefore, re-sampling and data size reduction have become important steps to pre-process the data. In this paper, a sampling strategy over a large imbalanced dataset is proposed, in which the samples of the larger class are selected based on fuzzy logic. To further reduce the data size, the evolutionary computational method of CHC is employed. The evaluation is done by applying a Support Vector Machine (SVM) to train a classification model from the re-sampled training sets. From experimental results, it can be seen that our proposed method improves both the F-measure and AUC. The complexity of the classification model is also compared. It is found that our proposed method is superior to all other compared methods.
  • Keywords
    data reduction; evolutionary computation; fuzzy logic; pattern classification; support vector machines; AUC; CHC; F-measure; SVM; classification problems; data pre-processing; data re-sampling; data size reduction; evolutionary computational method; fuzzy logic; high error rate; large imbalanced dataset; long training time; minority class samples; re-sampled training sets; support vector machine; under-sampling method; Biological cells; Educational institutions; Fuzzy logic; Sociology; Statistics; Support vector machines; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems (FUZZ-IEEE), 2014 IEEE International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4799-2073-0
  • Type

    conf

  • DOI
    10.1109/FUZZ-IEEE.2014.6891771
  • Filename
    6891771