• DocumentCode
    528499
  • Title

    Handling single node failures using agents in computer clusters

  • Author

    Varghese, Blesson ; McKee, Gerard ; Alexandrov, Vassil

  • Author_Institution
    Sch. of Syst. Eng., Univ. of Reading, Reading, UK
  • fYear
    2010
  • fDate
    11-14 July 2010
  • Firstpage
    96
  • Lastpage
    101
  • Abstract
    The work reported in this paper is motivated towards handling single node failures for parallel summation algorithms in computer clusters. An agent based approach is proposed in which a task to be executed is decomposed to sub-tasks and mapped onto agents that traverse computing nodes. The agents intercommunicate across computing nodes to share information during the event of a predicted node failure. Two single node failure scenarios are considered. The Message Passing Interface is employed for implementing the proposed approach. Quantitative results obtained from experiments reveal that the agent based approach can handle failures more efficiently than traditional failure handling approaches.
  • Keywords
    failure analysis; message passing; multi-agent systems; parallel algorithms; pattern clustering; statistical analysis; computer cluster agent; information sharing; message passing interface; parallel summation algorithms; predicted node failure; single node failure handling; subtask decomposition; traverse computing nodes; Checkpointing; Fault tolerance; Fault tolerant systems; Hardware; Message passing; Middleware; Parallel processing; Agent-based failure handling; Cluster computing; Failure handling; Message Passing Interface; Single node failure;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Performance Evaluation of Computer and Telecommunication Systems (SPECTS), 2010 International Symposium on
  • Conference_Location
    Ottawa, ON
  • Print_ISBN
    978-1-56555-340-8
  • Type

    conf

  • Filename
    5588878