Title :
HFMS: Managing the lifecycle and complexity of hybrid analytic data flows
Author :
Simitsis, Alkis ; Wilkinson, K. ; Dayal, U. ; Meichun Hsu
Author_Institution :
HP Labs., Palo Alto, CA, USA
Abstract :
To remain competitive, enterprises are evolving their business intelligence systems to provide dynamic, near realtime views of business activities. To enable this, they deploy complex workflows of analytic data flows that access multiple storage repositories and execution engines and that span the enterprise and even outside the enterprise. We call these multi-engine flows hybrid flows. Designing and optimizing hybrid flows is a challenging task. Managing a workload of hybrid flows is even more challenging since their execution engines are likely under different administrative domains and there is no single point of control. To address these needs, we present a Hybrid Flow Management System (HFMS). It is an independent software layer over a number of independent execution engines and storage repositories. It simplifies the design of analytic data flows and includes optimization and executor modules to produce optimized executable flows that can run across multiple execution engines. HFMS dispatches flows for execution and monitors their progress. To meet service level objectives for a workload, it may dynamically change a flow´s execution plan to avoid processing bottlenecks in the computing infrastructure. We present the architecture of HFMS and describe its components. To demonstrate its potential benefit, we describe performance results for running sample batch workloads with and without HFMS. The ability to monitor multiple execution engines and to dynamically adjust plans enables HFMS to provide better service guarantees and better system utilization.
Keywords :
competitive intelligence; data analysis; storage management; HFMS; business intelligence system; complexity management; executor module; hybrid analytic data flow; hybrid flow management system; independent execution engine; independent software layer; lifecycle management; multiengine flows hybrid flow; optimization module; storage repository; system utilization; workload management; Business; Connectors; Databases; Engines; Fault tolerance; Monitoring; Optimization;
Conference_Titel :
Data Engineering (ICDE), 2013 IEEE 29th International Conference on
Conference_Location :
Brisbane, QLD
Print_ISBN :
978-1-4673-4909-3
Electronic_ISBN :
1063-6382
DOI :
10.1109/ICDE.2013.6544907