Title of article :
A Predictive Model for Mortality of Patients with Thalassemia using Logistic Regression Model and Genetic Algorithm
Author/Authors :
Hajipour, Mahmoud Student Research Committee - Dept. of Epidemiology - School of Public Health - Shahid Beheshti University of Medical Sciences, Tehran , Etminani, Kobra Department of Medical Informatics - Faculty of Medicine - Mashhad University of Medical Sciences , Rahmatinejad, Zahra Department of Medical Informatics - Faculty of Medicine - Mashhad University of Medical Sciences , Soltani, Maryam Razi Clinical Research Development Unit (RCRDU) - Birjand University of Medical Sciences (BUMS), Birjand , Etemad, Koorosh Department of Epidemiology - Environmental and Occupational Hazards Control Research Center - School of Public Health - Shahid Beheshti University of Medical Sciences, Tehran , Eslami, Saeid Department of Medical Informatics - Faculty of Medicine - Mashhad University of Medical Sciences , Golabpour, Amin School of Medicine - Shahroud University of Medical Sciences
Abstract :
Background: Due to the thalassemia severe complications, prediction
of mortality or patients survival has a great importance in early
treatment phases. This study purpose was to predict the mortality rate
of patients with thalassemia major and thalassemia intermedia, by the
use of the binary logistic regression algorithm and genetic algorithm
combination.
Methods: This retrospective cohort study was conducted on 909
thalassemia patients by using a questionnaire during 2004-2014. The
data of all patients referring to Imam Reza Hospital from 2004 to 2014
have been considered. This study predictive variable is considered to
be death or survival of the patient. In this research, we embedded the
missing data by the use of the proposed data mining model and MICE
algorithm. Totally, 100 patients were excluded from this research, due
to the missing or out-of-range data. Death was considered as
dependent variable. Also, a predictive model was designed in order to
predict the patient mortality using MATLAB language.
Results: Mean age of the thalassemia patients was 25.7±9.04 years old
and at the end of the study death was reported in 185 subjects.
Additionally, there were also 26 independent variables. Moreover, the
missing variables mean for each patient was 1.8±0.81. The combined
predictive model was able to predict the patient survival rate with
94.35% accuracy. In this research, it was found out that 26
independent variables, which were collected from 12 variables were
patient mortality predictors. Also, missing data imputation is an
important method for increasing the data mining algorithms efficiency.
Conclusions: According to this study results, the use of missing
algorithm with the data analysis aid yielded more accurate results, in
comparison with the MICE algorithm. Furthermore, 12 parameters
affected the patient mortality prediction, which were extracted by the
genetic algorithm. Accuracy of the predictive model for the patient
death detection was favorable. Consequently, it is recommended to use
this model in order to predict the patient mortality.
Keywords :
Thalassemia , Regression , Missing data , Data mining
Journal title :
Astroparticle Physics