Title of article :
machine learning based methods for handling imbalanced data in hepatitis diagnosis
Author/Authors :
orooji, azam north khorasan university of medical sciences, bojnurd, iran , kermani, farzaneh semnan university of medical sciences - school of allied medical sciences - department of health information technology, semnan, iran
Abstract :
introduction: hepatitis c virus is the leading cause of mortality from liver disease. also, diagnosis systems are usable tools for better disease control and management. the aim of this study was to design an hcv disease prediction system and classify its severity based on data mining methods. method: this is an applied research that uses the hepatitis c dataset in the uci library. the study was conducted in four steps including data preprocessing, data mining, evaluation and system design. in data preprocessing, data balancing techniques were performed. then, three data mining algorithms (multilayer perceptron, bayesian network, and decision tree) were implemented and 10fold crossvalidation method was used to evaluate data mining algorithms. finally, user interface was designed in matlab programming language (version 2016) based on the best algorithm. results:the results showed that the oversampling method improved the performance measures of data mining algorithms in disease prediction, so that in the odataset the accuracy of the best method (random forest) was 99.9%. also, the random forest for the odataset had the best performance measures in term of sensitivity, accuracy and fmeasure (99.9%) and the 100% specificity amount. conclusion: considering that the presented approach has performed better than all suggested methods in previous studies, the proposed system in this study can be used well in hcv diagnosing and determining its severity.
Keywords :
hepatitis c virus (hcv) , prediction , data mining , machine learning , imbalanced data ,
Journal title :
frontiers in health informatics
Journal title :
frontiers in health informatics