Title :
A new over-sampling technique based on SVM for imbalanced diseases data
Author :
Jinjin Wang ; Yukai Yao ; Hanhai Zhou ; Mingwei Leng ; Xiaoyun Chen
Author_Institution :
Sch. of Inf. Sci. & Eng., Lanzhou Univ., Lanzhou, China
Abstract :
In the real world, there are many kinds of diseases data, whose patients are composed of majority normal persons and only minority abnormal ones. Many researchers ignored these imbalance problems, so their learning models usually led to a bias in the majority normal class. To deal with this problem, a new over-sampling technique was proposed to over-sample the minority class to balance the data samples and improve Support Vector Machine(SVM) in imbalanced diseases data sets. For the minority class, a K-Nearest Neighbor(KNN) graph is built. Second, the proposed technique gets a Minimum Spanning Tree(MST) based on the graph. Third, the proposed technique generates synthetic samples by using SMOTE along the direct path in the tree. The performance of the proposed technique based on SVM is evaluated with several diseases data sets taken from the UCI machine learning repository, and the experiments show that the proposed technique based on SVM can improve the Sensitivity value and G-Mean value.
Keywords :
diseases; learning (artificial intelligence); medical computing; sampling methods; support vector machines; trees (mathematics); G-mean value; K-nearest neighbor graph; KNN graph; MST; SMOTE; SVM; UCI machine learning repository; imbalanced disease data; minimum spanning tree; over-sampling technique; sensitivity value; support vector machine; Accuracy; Classification algorithms; Diabetes; Diseases; Medical diagnostic imaging; Sensitivity; Support vector machines; Imbalanced diseases data; Over-sampling; Support vector machine;
Conference_Titel :
Mechatronic Sciences, Electric Engineering and Computer (MEC), Proceedings 2013 International Conference on
Conference_Location :
Shengyang
Print_ISBN :
978-1-4799-2564-3
DOI :
10.1109/MEC.2013.6885254