Title :
Applying Over-sampling Technique Based on Data Density and Cost-sensitive SVM to Imbalanced Learning
Author :
Cao, Qinghua ; Wang, Senzhang
Author_Institution :
Dept. of Comput. Sci. & Eng., Beihang Univ., Beijing, China
Abstract :
Performance of SVM is greatly limited when it is used to imbalanced datasets in which the classification categories are not approximately equally represented. In real world datasets are often composed of "normal" examples with only a small percentage of "abnormal" examples. Under-sampling of majority class and over-sampling minority class are two obvious ways to balance the datasets before training. SMOTE algorithm is a simple and effective over-sampling technique. But SMOTE algorithm ignores data distribution and density information which is important to synthesize minority examples. SMOTE algorithm cannot effectively eliminate the influence of noise either. A novel over-sampling algorithm-SMOBD is proposed and shows better performance in experiments. We also combine this algorithm with different error costs SVM. We compare the performance of our algorithm against regular SVM, SMOTE, SMOTE-ENN, SDC (SMOTE with different costs of SVM) and the experiment results show our algorithm outperforms all of them.
Keywords :
data handling; learning (artificial intelligence); support vector machines; SMOTE algorithm; cost sensitive SVM; data density; data distribution; density information; imbalanced learning; oversampling technique application; Algorithm design and analysis; Arrays; Classification algorithms; Noise; Noise measurement; Support vector machines; Training; SMOBD; SMOTE; cost-sensitive SVM; data density; imbalanced learning;
Conference_Titel :
Information Management, Innovation Management and Industrial Engineering (ICIII), 2011 International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-61284-450-3
DOI :
10.1109/ICIII.2011.276