Predicting disease by using data mining based on healthcare information system

Author

Huang, Feixiang ; Wang, Shengyong ; Chan, Chien-Chung

Author_Institution

University of Akron, OH 44325, USA

fYear

2012

fDate

11-13 Aug. 2012

Firstpage

191

Lastpage

194

Abstract

This paper applies the data mining process to predict hypertension from patient medical records with eight other diseases. A sample with the size of 9862 cases has been studied. The sample was extracted from a real world Healthcare Information System database containing 309383 medical records. We observed that the distribution of patient diseases in the medical database is imbalanced. Under-sampling technique has been applied to generate training data sets, and data mining tool Weka has been used to generate the Naïve Bayesian and J-48 classifiers. In addition, an ensemble of five J-48 classifiers was created trying to improve the prediction performance, and rough set tools were used to reduce the ensemble based on the idea of second-order approximation. Experimental results showed a little improvement of the ensemble approach over pure Naïve Bayesian and J-48 in accuracy, sensitivity, and F-measure.

Keywords

Accuracy; Area measurement; Bayesian methods; Humans; Immune system; Niobium; Sensitivity;

fLanguage

English

Publisher

ieee

Conference_Titel

Granular Computing (GrC), 2012 IEEE International Conference on

Conference_Location

Hangzhou, China

Print_ISBN

978-1-4673-2310-9

Type

conf

DOI

10.1109/GrC.2012.6468691

Filename

6468691