Author :
Tremblay, Monica Chiarini ; Berndt, Donald J. ; Studnicki, James
Abstract :
Measuring health outcomes is a difficult challenge and potentially controversial undertaking. However, monitoring health outcomes can provide the basis for quality improvement initiatives, effective healthcare management, and even consumer education. As part of an overall data mining process to predict health outcomes, the data preparation tasks offer significant challenges. This paper focuses on the data preparation methods that will support ongoing modeling efforts for health outcomes research. Features or attributes are organized into high-level categories, highlighting collections of potentially predictive variables. This important data mining activity is often given limited coverage, yet provides a critical foundation for future research. Decision trees, regression, and histograms are used for feature selection, transformation, and data reduction. The paper illustrates these data preparation methods for a restricted surgical domain, the digestive system, with hospital length of stay as an outcome. Predictive models based on neural networks and support vector machines are presented.