Title of article :
Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services
Author/Authors :
Chrimes, Dillon Vancouver Island Health Authority - Vancouver, Canada , Zamani, Hamid School of Health Information Science - Faculty of Human and Social Development - University of Victoria - Victoria, Canada
Abstract :
Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges of data aggregation,
maintenance, integration, translation, analysis, and security/privacy. The study objective to establish an interactive BDA platform
with simulated patient data using open-source software technologies was achieved by construction of a platform framework with
Hadoop Distributed File System (HDFS) using HBase (key-value NoSQL database). Distributed data structures were generated
from benchmarked hospital-specific metadata of nine billion patient records. At optimized iteration, HDFS ingestion of HFiles to
HBase store files revealed sustained availability over hundreds of iterations; however, to complete MapReduce to HBase required a
week (for 10 TB) and a month for three billion (30 TB) indexed patient records, respectively. Found inconsistencies of MapReduce
limited the capacity to generate and replicate data efficiently. Apache Spark and Drill showed high performance with high usability
for technical support but poor usability for clinical services. Hospital system based on patient-centric data was challenging in
using HBase, whereby not all data profiles were fully integrated with the complex patient-to-hospital relationships. However, we
recommend using HBase to achieve secured patient data while querying entire hospital volumes in a simplified clinical event model
across clinical services.
Keywords :
HBase , Clinical , HDFS , BDA
Journal title :
Computational and Mathematical Methods in Medicine