Title :
Supporting Strong Reliability for Distributed Complex Event Processing Systems
Author :
Völz, Marco ; Koldehofe, Boris ; Rothermel, Kurt
Author_Institution :
Inst. of Parallel & Distrib. Syst., Univ. Stuttgart, Stuttgart, Germany
Abstract :
Many application classes such as monitoring applications, involve processing a massive amount of data from a possibly huge number of data sources. Complex Event Processing (CEP) has evolved as the paradigm of choice to determine meaningful situations (complex events) by performing stepwise correlation over event streams. To keep up with the high scalability demands of growing input streams, recent approaches distribute event correlation over several correlation nodes. However, already a failure of a single correlation node impacts the correctness of the final correlation result. In this paper, we illustrate the importance of a strong reliability semantics for CEP in the context of a monitoring application in a distributed production environment. Strong reliability ensures each complex event is detected and delivered exactly once to each application entity, and cannot be guaranteed by the naive application of established replication principles. We present a replication scheme which ensures strong reliability in an asynchronous system model and can be applied to an arbitrary distributed CEP system. The algorithm tolerates f simultaneous failures by introducing f additional replicas for each correlation node. We prove correctness as well as evaluate the overhead introduced by the algorithm. Results show, that the overhead scales linearly with the number of deployed replicas and the node failure rate.
Keywords :
distributed processing; manufacturing processes; process monitoring; system recovery; asynchronous system model; distributed CEP system; distributed complex event processing system; distributed production environment; event correlation node failure rate; monitoring application; reliability semantics; replication scheme; scalability demands; Correlation; Monitoring; Nominations and elections; Production; Reliability; Semantics; Upper bound; complex event processing; failure recovery; monitoring; reliability; replication;
Conference_Titel :
High Performance Computing and Communications (HPCC), 2011 IEEE 13th International Conference on
Conference_Location :
Banff, AB
Print_ISBN :
978-1-4577-1564-8
Electronic_ISBN :
978-0-7695-4538-7
DOI :
10.1109/HPCC.2011.69