• DocumentCode
    235791
  • Title

    Processing and analysis of imbalanced liver cancer patient data by case-based reasoning

  • Author

    Yan-Bo Lin ; Xiao-Ou Ping ; Te-Wei Ho ; Feipei Lai

  • Author_Institution
    Grad. Inst. of Biomed. Electron. & Bioinf., Nat. Taiwan Univ., Taipei, Taiwan
  • fYear
    2014
  • fDate
    26-28 Nov. 2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    The research on clinical data is one of the fastest growing fields all over the world. In general, most of the data have imbalanced issues, which may cause some problems in the researches. In this study, the methods of over-sampling and under-sampling are used for handling the issues of data imbalanced. The case based reasoning (CBR) is used for developing classification models to predict recurrent statuses of patients with liver cancer. Classification results of these two methods are compared with those of an original imbalanced dataset by the standard indicators, such as sensitivity, specificity, balanced accuracy (BAC), positive predictive value (PPV), negative predictive value (NPV), and accuracy. According to the preliminary results of classification methods, on average, the BAC of balanced methods of the under-sampling (66.07%) and the over-sampling (54.24%) exert a significant improvement compared with the imbalanced grouping dataset (48.33%). Most importantly, the under-sampling method could acquire the highest mean accuracy of the three datasets (under-sampling: 66.76%, over-sampling: 53.47%, imbalanced: 48.58%). In under-sampling method, mean PPV, NPV, and accuracy are higher than 65% (PPV: 65.44%, NPV: 69.44%, accuracy: 66.76%). The balanced datasets can provide benefits for classification models and efficiently reduce biased interpretations.
  • Keywords
    cancer; case-based reasoning; data analysis; liver; medical computing; pattern classification; sampling methods; balanced accuracy; case based reasoning; case-based reasoning; classification models; clinical data research; imbalanced grouping dataset; imbalanced liver cancer patient data analysis; imbalanced liver cancer patient data processing; negative predictive value; over-sampling methods; positive predictive value; standard indicators; under-sampling methods; Accuracy; Atmospheric measurements; Bioinformatics; Calculators; Cancer; Particle measurements; Training; case-base reasoning; imbalanced dataset; liver cancer; over-sampling; under-sampling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Biomedical Engineering International Conference (BMEiCON), 2014 7th
  • Conference_Location
    Fukuoka
  • Type

    conf

  • DOI
    10.1109/BMEiCON.2014.7017371
  • Filename
    7017371