شماره ركورد كنفرانس :
2727
عنوان مقاله :
A hybrid data mining approach for predicting breast cancer survivability on imbalanced SEER data set
عنوان به زبان ديگر :
A hybrid data mining approach for predicting breast cancer survivability on imbalanced SEER data set
پديدآورندگان :
Miri Rostami Samaneh نويسنده Shiraz University of Technology - Faculty of Computer Engineering & IT , Ahmadzadeh Marzieh نويسنده Shiraz University of Technology - Faculty of Computer Engineering & IT , Khayami Raouf نويسنده Shiraz University of Technology - Faculty of Computer Engineering & IT
كليدواژه :
Breast Cancer , imbalanced learning problem , Survival analysis , outlier , Oversampling technique
عنوان كنفرانس :
اولين كنفرانس بين المللي دستاوردهاي نوين پژوهشي در مهندسي برق و كامپيوتر
چكيده لاتين :
With advances in diagnosis and treatment of breast cancer the number of patients who survive is more than
the number of patients who die, So the breast cancer data sets have been imbalanced. An imbalanced problem is a challenging issue for Data Mining. In this study, we propose the hybrid approach to build a more accurate prediction model for 5-year survivability of breast cancer patients in presence of outliers and an imbalanced data set problem. To achieve this goal after data preprocessing and classifying data set into two classes, firstly outliers in minority class eliminated and boundary of minority class became stronger based on Borderline-SMOTE. Then three data mining techniques, such as Bayes Nets, Decision tree (C4.5) and 1-nearest neighbor search are applied to final improved data set. Some assessment metrics such as accuracy, sensitivity, specificity, and G-mean were utilized in order to evaluate the performance of proposed hybrid approach. Results showed that among all combinations, proposed approach with C4.5 presents best efficiency in criteria of accuracy, ensitivity, specificity, and G-mean with 98.962%, 0.926, 0.989 and 0.956, respectively
شماره مدرك كنفرانس :
4240260