Title :
Push-based system for molecular simulation data analysis
Author :
Vladimir Grupcev;Yi-Cheng Tu;Joseph Fogarty;Sagar Pandit
Author_Institution :
Department of Computer Science and Engineering, University of South Florida, 4202 E. Fowler Ave., ENB 118 Tampa, FL 33620, U.S.A.
Abstract :
Many scientific fields generate, and require manipulation of big data. Known scientific data analysis systems, as well as traditional DBMSs, follow a pull-based architectural design, where the executed queries mandate the data needed. This design, while suitable for traditional transaction-based workloads where number of queries retrieve small parts of data located at various places of the database, is ill-fitted for applications involving complex analysis on most of the data. Such design involves redundant and random I/O, considerably affecting the data throughput in the system. In this paper, we design and implement a push-based type system that allows high-throughput data analysis in the process of scientific discovery. Our design improves throughput in two ways: i) it uses a sequential scan-based I/O framework that loads the data into the main memory, and then ii) the system pushes the loaded data to a number of pre-programmed queries. By this way the system lowers the unnecessary I/O overhead imposed by the randomized, index-based scan and that of a multiple data reads if each query were to be fed separately. Considering the amount of data and the number of executed queries, we believe our system provides substantial improvement over the current data analyzing systems. The efficiency of the proposed system is backed by the results of extensive experiments using real MS data. The running times of our system are compared to those of the GROMACS system. The comparison shows the advantage and the potential of using such push-based system for data system analysis.
Keywords :
"Data analysis","Computational modeling","Biological system modeling","Big data","Analytical models","Databases","Throughput"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7363949