Title :
Predicting disease by using data mining based on healthcare information system
Author :
Huang, Feixiang ; Wang, Shengyong ; Chan, Chien-Chung
Author_Institution :
University of Akron, OH 44325, USA
Abstract :
This paper applies the data mining process to predict hypertension from patient medical records with eight other diseases. A sample with the size of 9862 cases has been studied. The sample was extracted from a real world Healthcare Information System database containing 309383 medical records. We observed that the distribution of patient diseases in the medical database is imbalanced. Under-sampling technique has been applied to generate training data sets, and data mining tool Weka has been used to generate the Naïve Bayesian and J-48 classifiers. In addition, an ensemble of five J-48 classifiers was created trying to improve the prediction performance, and rough set tools were used to reduce the ensemble based on the idea of second-order approximation. Experimental results showed a little improvement of the ensemble approach over pure Naïve Bayesian and J-48 in accuracy, sensitivity, and F-measure.
Keywords :
Accuracy; Area measurement; Bayesian methods; Humans; Immune system; Niobium; Sensitivity;
Conference_Titel :
Granular Computing (GrC), 2012 IEEE International Conference on
Conference_Location :
Hangzhou, China
Print_ISBN :
978-1-4673-2310-9
DOI :
10.1109/GrC.2012.6468691