Title of article :
Extracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem
Author/Authors :
Miri Rostami ، S. - Shiraz University of Technology , Ahmadzadeh ، M. - Shiraz University of Technology
Pages :
14
From page :
263
To page :
276
Abstract :
Application of data mining methods as a decision support system has a great benefit to predict survival of new patients. It also has a great potential for health researchers to investigate the relationship between risk factors and cancer survival. However, due to the imbalanced nature of the datasets associated with breast cancer survival, the accuracy of survival prognosis models is a challenging issue for researchers. This work aimed to develop a predictive model for 5-year survivability of breast cancer patients and discover the relationships between certain predictive variables and survival. The dataset was obtained from the SEER database. First, the effectiveness of two synthetic over-sampling methods Borderline-Synthetic Minority Over-sampling Technique (Borderline-SMOTE) and Density-based Synthetic Oversampling (DSO) method is investigated to solve the class imbalance problem. Then a combination of Particle Swarm Optimization (PSO) and Correlation-based Feature Selection (CFS) is used to identify the most important predictive variables. Finally, in order to build a predictive model, the three classifiers decision tree (C4.5), Bayesian Network (BN), and Logistic Regression (LR) are applied to the datasets. Some assessment metrics such as accuracy, sensitivity, specificity, and G-mean are used to evaluate the performance of the proposed hybrid approach. Also the area under ROC curve (AUC) is used to evaluate the performance of the feature selection method. The results obtained show that among all combinations, DSO + PSO_CFS + C4.5 presents the best efficiency in terms of accuracy, sensitivity, G-mean, and AUC with the values of 94.33%, 0.930, 0.939, and 0.939, respectively.
Keywords :
Breast Cancer , survival , Class Imbalance Problem , over , sampling technique , Feature Selection
Journal title :
Journal of Artificial Intelligence Data Mining
Serial Year :
2018
Journal title :
Journal of Artificial Intelligence Data Mining
Record number :
2449349
Link To Document :
بازگشت