Title : 
Flexpath: Type-Based Publish/Subscribe System for Large-Scale Science Analytics
         
        
            Author : 
Dayal, Jai ; Bratcher, Dick ; Eisenhauer, Greg ; Schwan, Karsten ; Wolf, Michael ; Xuechen Zhang ; Abbasi, Hasan ; Klasky, Scott ; Podhorszki, Norbert
         
        
            Author_Institution : 
Georgia Inst. of Technol., Atlanta, GA, USA
         
        
        
        
        
        
            Abstract : 
As high-end systems move toward exascale sizes, a new model of scientific inquiry being developed is one in which online data analytics run concurrently with the high end simulations producing data outputs. Goals are to gain rapid insights into the ongoing scientific processes, assess their scientific validity, and/or initiate corrective or supplementary actions by launching additional computations when needed. The Flexpath system presented in this paper addresses the fundamental problem of how to structure and efficiently implement the communications between high end simulations and concurrently running online data analytics, the latter comprised of componentized dynamic services and service pipelines. Using a type-based publish/subscribe approach, Flexpath encourages diversity by permitting analytics services to differ in their computational and scaling characteristics and even in their internal execution models. Flex path uses direct and MxN connections between interacting services to reduce data movements, to allow for runtime connectivity changes to accommodate component arrivals/departures, and to support the multiple underlying communication protocols used for analytics workflows in which simulation outputs are processed by analytics services residing on the same nodes where they are generated, on the same machine, and/or on attached or remote analytics engines. This paper describes the design and implementation of Flexpath, and evaluates it with two widely used scientific applications and their associated data analytics methods.
         
        
            Keywords : 
data analysis; middleware; scientific information systems; Flexpath; MxN connections; analytics services; analytics workflows; attached analytics engines; communication protocols; component arrivals/departures; componentized dynamic services; computational characteristics; data movements; data outputs; exascale sizes; high end simulations; high-end systems; internal execution models; large-scale science analytics; online data analytics; remote analytics engines; runtime connectivity; scaling characteristics; scientific inquiry; scientific processes; scientific validity; service pipelines; Analytical models; Arrays; Computational modeling; Data models; Pipelines; Runtime; Subscriptions; Code Coupling; Data Analytics; Data Staging; Publish/Subscribe; Scalable I/O; in-Situ;
         
        
        
        
            Conference_Titel : 
Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on
         
        
            Conference_Location : 
Chicago, IL
         
        
        
            DOI : 
10.1109/CCGrid.2014.104