• DocumentCode
    2837304
  • Title

    Applying Over-sampling Technique Based on Data Density and Cost-sensitive SVM to Imbalanced Learning

  • Author

    Cao, Qinghua ; Wang, Senzhang

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Beihang Univ., Beijing, China
  • Volume
    2
  • fYear
    2011
  • fDate
    26-27 Nov. 2011
  • Firstpage
    543
  • Lastpage
    548
  • Abstract
    Performance of SVM is greatly limited when it is used to imbalanced datasets in which the classification categories are not approximately equally represented. In real world datasets are often composed of "normal" examples with only a small percentage of "abnormal" examples. Under-sampling of majority class and over-sampling minority class are two obvious ways to balance the datasets before training. SMOTE algorithm is a simple and effective over-sampling technique. But SMOTE algorithm ignores data distribution and density information which is important to synthesize minority examples. SMOTE algorithm cannot effectively eliminate the influence of noise either. A novel over-sampling algorithm-SMOBD is proposed and shows better performance in experiments. We also combine this algorithm with different error costs SVM. We compare the performance of our algorithm against regular SVM, SMOTE, SMOTE-ENN, SDC (SMOTE with different costs of SVM) and the experiment results show our algorithm outperforms all of them.
  • Keywords
    data handling; learning (artificial intelligence); support vector machines; SMOTE algorithm; cost sensitive SVM; data density; data distribution; density information; imbalanced learning; oversampling technique application; Algorithm design and analysis; Arrays; Classification algorithms; Noise; Noise measurement; Support vector machines; Training; SMOBD; SMOTE; cost-sensitive SVM; data density; imbalanced learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Management, Innovation Management and Industrial Engineering (ICIII), 2011 International Conference on
  • Conference_Location
    Shenzhen
  • Print_ISBN
    978-1-61284-450-3
  • Type

    conf

  • DOI
    10.1109/ICIII.2011.276
  • Filename
    6116764