Title :
A comprehensive evaluation of NoSQL datastores in the context of historians and sensor data analysis
Author :
Arun Kumar Kalakanti;Vinay Sudhakaran;Varsha Raveendran;Nisha Menon
Author_Institution :
Data-centric Systems Research Group, Siemens Corporate Research and Technologies, Siemens Technology and Services Pvt. Ltd., Bangalore, India
Abstract :
Data historians[1] are today transitioning from their traditional role as record-keepers and planners, to tools that provide the required flexibility and responsiveness to customers´ requirements in terms of the type and volume of data stored, archived and queried. Added dimensions to these requirements are the need for high performance and scalability. Businesses are realizing that traditional database management systems i.e. Relational Database Management Systems (RDBMS) might not be able to handle the deluge of industrial data they are experiencing. With the emerging NoSQL paradigm, there are different kinds of datastores which addresses specific requirements such as improved performance, reliability or user experience. Our study of two NoSQL datastores, HBase and Cassandra, provide the required insights for business units to choose the right technology for their next generation historian systems. To facilitate this study, we propose and use a benchmarking studio that has the ability to generate data for a configurable schema and workload patterns, thus enabling us to perform business use-case specific evaluation of datastores while measuring the key performance indicators. Two industrial cases in the plant automation and energy management are considered for this evaluation. Efficient data modeling techniques and batching mechanisms are defined to store streaming time-series data from sensors and devices with high throughput of approximately one million inserts/second. Mixed workload scenarios are considered to align with the requirements of next generation historians. Detailed experiments for the evaluation of concurrency and load management, scalability, consistency, BI query performance and fault-tolerance are performed on Amazon EC2 dedicated infrastructure for reproducibility and verifiability in the future.
Keywords :
"Benchmark testing","Data models","Throughput","Business","Performance evaluation","Scalability"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7363952