• DocumentCode
    671693
  • Title

    A quasi-linear SVM combined with assembled SMOTE for imbalanced data classification

  • Author

    Bo Zhou ; Cheng Yang ; Haixiang Guo ; Jinglu Hu

  • Author_Institution
    Grad. Sch. of Inf., Production & Syst., Waseda Univ. of Hibikino, Kitakyushu, Japan
  • fYear
    2013
  • fDate
    4-9 Aug. 2013
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    This paper focuses on imbalanced dataset classification problem by using SVM and oversampling method. Traditional oversampling method increases the occurrence of over-lapping between classes, which leads to poor generalization of SVM classification. To solve this problem this paper proposes a combined method of quasi-linear SVM and assembled SMOTE. The quasi-linear SVM is an SVM with quasi-linear kernel function. It realizes an approximate nonlinear separation boundary by mulit-local linear boundaries with interpolation. The assembled SMOTE implements oversampling with considering of the data distribution information and avoids occurrence of overlapping between classes. Firstly, a partition method based on Minimal Spanning Tree is proposed to obtain local linear partitions, each of which can be separated with one linear separation boundary. Secondly, using the information of local linear partitions, the assembled SMOTE generates synthetic minority class samples. Finally, the quasi-linear SVM realizes a classification of oversampled datasets in the same way as a standard SVM by using a composite quasi-linear kernel function. Experiment results on artificial data and benchmark datasets show that the proposed method is effective and improves classification performances.
  • Keywords
    approximation theory; interpolation; pattern classification; sampling methods; support vector machines; trees (mathematics); approximate nonlinear separation boundary; artificial data datasets; assembled SMOTE; benchmark datasets; classification performance improvement; composite quasilinear kernel function; data distribution information; imbalanced dataset classification problem; interpolation; linear separation boundary; local linear partitioning method; minimal spanning tree; mulitlocal linear boundaries; oversampled dataset classification; oversampling method; quasilinear SVM; quasilinear kernel function; standard SVM; synthetic minority class samples; synthetic minority over-sampling technique; Interpolation; Kernel; Merging; Sociology; Standards; Statistics; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2013 International Joint Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    2161-4393
  • Print_ISBN
    978-1-4673-6128-6
  • Type

    conf

  • DOI
    10.1109/IJCNN.2013.6707035
  • Filename
    6707035