Title :
Data preprocessing and mortality prediction: The Physionet/CinC 2012 challenge revisited
Author :
Johnson, Alistair Ew ; Kramer, Andrew A. ; Clifford, Gari D.
Author_Institution :
Univ. of Oxford, Oxford, UK
Abstract :
The Physionet/CinC 2012 challenge focused on improving patient specific mortality predictions in the intensive care unit. While most of the focus in the challenge was on applying sophisticated machine learning algorithms, little attention was paid to the preprocessing performed on the data a priori. We compare four standard pre-processing methods with a novel Box-Cox outlier rejection technique and analyze their effect on machine learning classifiers for predicting the mortality of ICU patients. The best machine learning model utilized the proposed preprocessing method and achieved an AUROC of 0.848. In general, the AUROC of models using our novel preprocessing method increased, and this increase was as much as 0.02 in some cases. Furthermore, the use of preprocessing improved the performance of regression models to a higher level than that of non-linear techniques such as random forests. We demonstrate that proper preprocessing of the data prior to use in a prognostic model can significantly improve performance. This improvement can be even greater than that provided by more complex non-linear machine learning algorithms.
Keywords :
cardiology; data mining; learning (artificial intelligence); medical computing; regression analysis; AUROC; Box-Cox outlier rejection technique; ICU patient mortality prediction; Physionet-CinC 2012 challenge; area under the receiver operating characteristic; complex nonlinear machine learning algorithms; computing in cardiology; intensive care unit; machine learning classifiers; prognostic model; random forests; regression model performance; sophisticated machine learning algorithms; standard pre-processing methods; Data models; Data preprocessing; Feature extraction; Heart rate; Predictive models; Support vector machines; Training;
Conference_Titel :
Computing in Cardiology Conference (CinC), 2014
Print_ISBN :
978-1-4799-4346-3