Title :
Decision Tree Based Predictive Models for Breast Cancer Survivability on Imbalanced Data
Author :
Liu Ya-Qin ; Wang Cheng ; Zhang Lu
Author_Institution :
Dept. of Biomed. Eng., Shanghai JiaoTong Univ., Shanghai, China
Abstract :
Based on imbalanced data, the predictive models for 5-year survivability of breast cancer using decision tree are proposed. After data preprocessing from SEER breast cancer datasets, it is obviously that the category of data distribution is imbalanced. Under-sampling is taken to make up the disadvantage of the performance of models caused by the imbalanced data. The performance of the models is evaluated by AUC under ROC curve, accuracy, specificity and sensitivity with 10-fold stratified cross-validation. The performance of models is best while the distribution of data is approximately equal. Bagging algorithm is used to build an integration decision tree model for predicting breast cancer survivability.
Keywords :
biological organs; cancer; data mining; decision trees; gynaecology; medical computing; prediction theory; sampling methods; tumours; AUC; ROC curve; bagging algorithm; breast cancer survivability; data distribution; data mining; data preprocessing; decision tree; imbalanced data analysis; predictive model; under-sampling method; Bagging; Biomedical engineering; Breast cancer; Cleaning; Data mining; Data preprocessing; Decision trees; Dictionaries; Predictive models; Sensitivity;
Conference_Titel :
Bioinformatics and Biomedical Engineering , 2009. ICBBE 2009. 3rd International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-2901-1
Electronic_ISBN :
978-1-4244-2902-8
DOI :
10.1109/ICBBE.2009.5162571