• DocumentCode
    2092985
  • Title

    Supporting Strong Reliability for Distributed Complex Event Processing Systems

  • Author

    Völz, Marco ; Koldehofe, Boris ; Rothermel, Kurt

  • Author_Institution
    Inst. of Parallel & Distrib. Syst., Univ. Stuttgart, Stuttgart, Germany
  • fYear
    2011
  • fDate
    2-4 Sept. 2011
  • Firstpage
    477
  • Lastpage
    486
  • Abstract
    Many application classes such as monitoring applications, involve processing a massive amount of data from a possibly huge number of data sources. Complex Event Processing (CEP) has evolved as the paradigm of choice to determine meaningful situations (complex events) by performing stepwise correlation over event streams. To keep up with the high scalability demands of growing input streams, recent approaches distribute event correlation over several correlation nodes. However, already a failure of a single correlation node impacts the correctness of the final correlation result. In this paper, we illustrate the importance of a strong reliability semantics for CEP in the context of a monitoring application in a distributed production environment. Strong reliability ensures each complex event is detected and delivered exactly once to each application entity, and cannot be guaranteed by the naive application of established replication principles. We present a replication scheme which ensures strong reliability in an asynchronous system model and can be applied to an arbitrary distributed CEP system. The algorithm tolerates f simultaneous failures by introducing f additional replicas for each correlation node. We prove correctness as well as evaluate the overhead introduced by the algorithm. Results show, that the overhead scales linearly with the number of deployed replicas and the node failure rate.
  • Keywords
    distributed processing; manufacturing processes; process monitoring; system recovery; asynchronous system model; distributed CEP system; distributed complex event processing system; distributed production environment; event correlation node failure rate; monitoring application; reliability semantics; replication scheme; scalability demands; Correlation; Monitoring; Nominations and elections; Production; Reliability; Semantics; Upper bound; complex event processing; failure recovery; monitoring; reliability; replication;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Communications (HPCC), 2011 IEEE 13th International Conference on
  • Conference_Location
    Banff, AB
  • Print_ISBN
    978-1-4577-1564-8
  • Electronic_ISBN
    978-0-7695-4538-7
  • Type

    conf

  • DOI
    10.1109/HPCC.2011.69
  • Filename
    6063028