• DocumentCode
    3058206
  • Title

    Towards Autonomic Fault Recovery in System-S

  • Author

    Jacques-Silva, Gabriela ; Challenger, Jim ; Degenaro, Lou ; Giles, James ; Wagle, Rohit

  • Author_Institution
    Univ. of Illinois at Urbana-Champaign, Urbana
  • fYear
    2007
  • fDate
    11-15 June 2007
  • Firstpage
    31
  • Lastpage
    31
  • Abstract
    System-S is a stream processing infrastructure which enables program fragments to be distributed and connected to form complex applications. There may be potentially tens of thousands of interdependent and heterogeneous program fragments running across thousands of nodes. While the scale and interconnection imply the need for automation to manage the program fragments, the need is intensified because the applications operate on live streaming data and thus need to be highly available. System-S has been designed with components that autonomically manage the program fragments, but the system components themselves are also susceptible to failures which can jeopardize the system and its applications. The work we present addresses the self healing nature of these management components in System-S. In particular, we show how one key component of System-S, the job management orchestrator, can be abruptly terminated and then recover without interrupting any of the running program fragments by reconciling with other autonomous system components. We also describe techniques that we have developed to validate that the system is able to autonomically respond to a wide variety of error conditions including the abrupt termination and recovery of key system components. Finally, we show the performance of the job management orchestrator recovery for a variety of workloads.
  • Keywords
    data analysis; system recovery; autonomic fault recovery; heterogeneous program fragment; job management orchestrator recovery; program fragments; stream processing infrastructure; system-S; Application software; Automation; Control systems; Data analysis; Distributed computing; Electronic mail; Internet telephony; Resource management; Speech analysis; Streaming media;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Autonomic Computing, 2007. ICAC '07. Fourth International Conference on
  • Conference_Location
    Jacksonville, FL
  • Print_ISBN
    0-7695-2779-5
  • Electronic_ISBN
    0-7695-2779-5
  • Type

    conf

  • DOI
    10.1109/ICAC.2007.40
  • Filename
    4273125