Title :
Risk adjustment of patient expenditures: A big data analytics approach
Author :
Lin Li ; Bagheri, Saeed ; Goote, Helena ; Hasan, Aftab ; Hazard, Gregg
Author_Institution :
Philips Res. North America, Briarcliff Manor, NY, USA
Abstract :
For healthcare applications, voluminous patient data contain rich and meaningful insights that can be revealed using advanced machine learning algorithms. However, the volume and velocity of such high dimensional data requires new big data analytics framework where traditional machine learning tools cannot be applied directly. In this paper, we introduce our proof-of-concept big data analytics framework for developing risk adjustment model of patient expenditures, which uses the “divide and conquer” strategy to exploit the big-yet-rich data to improve the model accuracy. We leverage the distributed computing platform, e.g., MapReduce, to implement advanced machine learning algorithms on our data set. In specific, random forest regression algorithm, which is suitable for high dimensional healthcare data, is applied to improve the accuracy of our predictive model. Our proof-of-concept framework demonstrates the effectiveness of predictive analytics using random forest algorithm as well as the efficiency of the distributed computing platform.
Keywords :
Big Data; data analysis; distributed processing; divide and conquer methods; health care; learning (artificial intelligence); medical information systems; regression analysis; MapReduce; big data analytics approach; big-yet-rich data; distributed computing platform; divide and conquer strategy; healthcare applications; high dimensional data; high dimensional healthcare data; machine learning algorithms; machine learning tools; model accuracy; patient data; patient expenditures; predictive analytics; predictive model; proof-of-concept big data analytics framework; proof-of-concept framework; random forest regression algorithm; risk adjustment; Computational modeling; Data handling; Data models; Data storage systems; Information management; Linear regression; Predictive models; Distributed Computing; Healthcare Big Data; Patient Expenditure; Random Forest; Risk Adjustment;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691790